regexp_parser-1.6.0/0000755000004100000410000000000013541126476014425 5ustar www-datawww-dataregexp_parser-1.6.0/README.md0000644000004100000410000004647713541126475015725 0ustar www-datawww-data# Regexp::Parser [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges) A Ruby gem for tokenizing, parsing, and transforming regular expressions. * Multilayered * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/) * A lexer that produces a "stream" of token objects. * A parser that produces a "tree" of Expression objects (OO API) * Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes. * Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax) _For examples of regexp_parser in use, see [Example Projects](#example-projects)._ --- ## Requirements * Ruby >= 1.9 * Ragel >= 6.0, but only if you want to build the gem or work on the scanner. _Note: See the .travis.yml file for covered versions._ --- ## Install Install the gem with: `gem install regexp_parser` Or, add it to your project's `Gemfile`: ```gem 'regexp_parser', '~> X.Y.Z'``` See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser) --- ## Usage The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them provides a single method that takes a regular expression (as a RegExp object or a string) and returns its results. The **Lexer** and the **Parser** accept an optional second argument that specifies the syntax version, like 'ruby/2.0', which defaults to the host Ruby version (using RUBY_VERSION). Here are the basic usage examples: ```ruby require 'regexp_parser' Regexp::Scanner.scan(regexp) Regexp::Lexer.lex(regexp) Regexp::Parser.parse(regexp) ``` All three methods accept a block as the last argument, which, if given, gets called with the results as follows: * **Scanner**: the block gets passed the results as they are scanned. See the example in the next section for details. * **Lexer**: after completion, the block gets passed the tokens one by one. _The result of the block is returned._ * **Parser**: after completion, the block gets passed the root expression. _The result of the block is returned._ --- ## Components ### Scanner A Ragel-generated scanner that recognizes the cumulative syntax of all supported syntax versions. It breaks a given expression's text into the smallest parts, and identifies their type, token, text, and start/end offsets within the pattern. #### Example The following scans the given pattern and prints out the type, token, text and start/end offsets for each token found. ```ruby require 'regexp_parser' Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/ do |type, token, text, ts, te| puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]" end # output # type: group, token: capture, text: '(' [0..1] # type: literal, token: literal, text: 'ab' [1..3] # type: quantifier, token: zero_or_one, text: '?' [3..4] # type: group, token: capture, text: '(' [4..5] # type: literal, token: literal, text: 'cd' [5..7] # type: group, token: close, text: ')' [7..8] # type: quantifier, token: zero_or_more, text: '*' [8..9] # type: set, token: open, text: '[' [9..10] # type: set, token: range, text: 'e-h' [10..13] # type: set, token: close, text: ']' [13..14] # type: quantifier, token: one_or_more, text: '+' [14..15] # type: group, token: close, text: ')' [15..16] ``` A one-liner that uses map on the result of the scan to return the textual parts of the pattern: ```ruby Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]} #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"] ``` #### Notes * The scanner performs basic syntax error checking, like detecting missing balancing punctuation and premature end of pattern. Flavor validity checks are performed in the lexer, which uses a syntax object. * If the input is a Ruby **Regexp** object, the scanner calls #source on it to get its string representation. #source does not include the options of the expression (m, i, and x). To include the options in the scan, #to_s should be called on the **Regexp** before passing it to the scanner or the lexer. For the parser, however, this is not necessary. It automatically exposes the options of a passed **Regexp** in the returned root expression. * To keep the scanner simple(r) and fairly reusable for other purposes, it does not perform lexical analysis on the tokens, sticking to the task of identifying the smallest possible tokens and leaving lexical analysis to the lexer. * The MRI implementation may accept expressions that either conflict with the documentation or are undocumented. The scanner does not support such implementation quirks. _(See issues [#3](https://github.com/ammar/regexp_parser/issues/3) and [#15](https://github.com/ammar/regexp_parser/issues/15) for examples)_ --- ### Syntax Defines the supported tokens for a specific engine implementation (aka a flavor). Syntax classes act as lookup tables, and are layered to create flavor variations. Syntax only comes into play in the lexer. #### Example The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and checks a few of their implementation features. ```ruby require 'regexp_parser' ruby_20 = Regexp::Syntax.new 'ruby/2.0' ruby_20.implements? :quantifier, :zero_or_one # => true ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true ruby_20.implements? :quantifier, :zero_or_one_possessive # => true ruby_20.implements? :conditional, :condition # => true ruby_19 = Regexp::Syntax.new 'ruby/1.9' ruby_19.implements? :quantifier, :zero_or_one # => true ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true ruby_19.implements? :quantifier, :zero_or_one_possessive # => true ruby_19.implements? :conditional, :condition # => false ruby_18 = Regexp::Syntax.new 'ruby/1.8' ruby_18.implements? :quantifier, :zero_or_one # => true ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true ruby_18.implements? :quantifier, :zero_or_one_possessive # => false ruby_18.implements? :conditional, :condition # => false ``` #### Notes * Variations on a token, for example a named group with angle brackets (< and >) vs one with a pair of single quotes, are specified with an underscore followed by two characters appended to the base token. In the previous named group example, the tokens would be :named_ab (angle brackets) and :named_sq (single quotes). These variations are normalized by the syntax to :named. --- ### Lexer Sits on top of the scanner and performs lexical analysis on the tokens that it emits. Among its tasks are; breaking quantified literal runs, collecting the emitted token attributes into Token objects, calculating their nesting depth, normalizing tokens for the parser, and checking if the tokens are implemented by the given syntax version. See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects) wiki page for more information on Token objects. #### Example The following example lexes the given pattern, checks it against the Ruby 1.9 syntax, and prints the token objects' text indented to their level. ```ruby require 'regexp_parser' Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token| puts "#{' ' * token.level}#{token.text}" end # output # a # ? # ( # b # ( # c # ) # ) # * # [ # d # ] # + ``` A one-liner that returns an array of the textual parts of the given pattern. Compare the output with that of the one-liner example of the **Scanner**; notably how the sequence 'cat' is treated. The 't' is separated because it's followed by a quantifier that only applies to it. ```ruby Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text} #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"] ``` #### Notes * The syntax argument is optional. It defaults to the version of the Ruby interpreter in use, as returned by RUBY_VERSION. * The lexer normalizes some tokens, as noted in the Syntax section above. --- ### Parser Sits on top of the lexer and transforms the "stream" of Token objects emitted by it into a tree of Expression objects represented by an instance of the Expression::Root class. See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects) wiki page for attributes and methods. #### Example ```ruby require 'regexp_parser' regex = /a?(b+(c)d)*(?[0-9]+)/ tree = Regexp::Parser.parse( regex, 'ruby/2.1' ) tree.traverse do |event, exp| puts "#{event}: #{exp.type} `#{exp.to_s}`" end # Output # visit: literal `a?` # enter: group `(b+(c)d)*` # visit: literal `b+` # enter: group `(c)` # visit: literal `c` # exit: group `(c)` # visit: literal `d` # exit: group `(b+(c)d)*` # enter: group `(?[0-9]+)` # visit: set `[0-9]+` # exit: group `(?[0-9]+)` ``` Another example, using each_expression and strfregexp to print the object tree. _See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods` for more information on these methods._ ```ruby include_root = true indent_offset = include_root ? 1 : 0 tree.each_expression(include_root) do |exp, level_index| puts exp.strfregexp("%>> %c", indent_offset) end # Output # > Regexp::Expression::Root # > Regexp::Expression::Literal # > Regexp::Expression::Group::Capture # > Regexp::Expression::Literal # > Regexp::Expression::Group::Capture # > Regexp::Expression::Literal # > Regexp::Expression::Literal # > Regexp::Expression::Group::Named # > Regexp::Expression::CharacterSet ``` _Note: quantifiers do not appear in the output because they are members of the Expression class. See the next section for details._ --- ## Supported Syntax The three modules support all the regular expression syntax features of Ruby 1.8, 1.9, and 2.x: _Note that not all of these are available in all versions of Ruby_ | Syntax Feature | Examples | ⋯ | | ------------------------------------- | ------------------------------------------------------- |:--------:| | **Alternation** | `a\|b\|c` | ✓ | | **Anchors** | `\A`, `^`, `\b` | ✓ | | **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | ✓ | | **Character Types** | `\d`, `\H`, `\s` | ✓ | | **Cluster Types** | `\R`, `\X` | ✓ | | **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | ✓ | | **Escape Sequences** | `\t`, `\\+`, `\?` | ✓ | | **Free Space** | whitespace and `# Comments` _(x modifier)_ | ✓ | | **Grouped Exps.** | | ⋱ | |   _**Assertions**_ | | ⋱ | |   _Lookahead_ | `(?=abc)` | ✓ | |   _Negative Lookahead_ | `(?!abc)` | ✓ | |   _Lookbehind_ | `(?<=abc)` | ✓ | |   _Negative Lookbehind_ | `(?abc)` | ✓ | |   _**Absence**_ | `(?~abc)` | ✓ | |   _**Back-references**_ | | ⋱ | |   _Named_ | `\k` | ✓ | |   _Nest Level_ | `\k` | ✓ | |   _Numbered_ | `\k<1>` | ✓ | |   _Relative_ | `\k<-2>` | ✓ | |   _Traditional_ | `\1` thru `\9` | ✓ | |   _**Capturing**_ | `(abc)` | ✓ | |   _**Comments**_ | `(?# comment text)` | ✓ | |   _**Named**_ | `(?abc)`, `(?'name'abc)` | ✓ | |   _**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | ✓ | |   _**Passive**_ | `(?:abc)` | ✓ | |   _**Subexp. Calls**_ | `\g`, `\g<1>` | ✓ | | **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | ✓ | | **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | ✓ | | **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | ✓ | | **Quantifiers** | | ⋱ | |   _**Greedy**_ | `?`, `*`, `+`, `{m,M}` | ✓ | |   _**Reluctant** (Lazy)_ | `??`, `*?`, `+?`, `{m,M}?` | ✓ | |   _**Possessive**_ | `?+`, `*+`, `++`, `{m,M}+` | ✓ | | **String Escapes** | | ⋱ | |   _**Control**_ | `\C-C`, `\cD` | ✓ | |   _**Hex**_ | `\x20`, `\x{701230}` | ✓ | |   _**Meta**_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | ✓ | |   _**Octal**_ | `\0`, `\01`, `\012` | ✓ | |   _**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | ✓ | | **Unicode Properties** | _([Unicode 11.0.0](http://www.unicode.org/versions/Unicode11.0.0/))_ | ⋱ | |   _**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | ✓ | |   _**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | ✓ | |   _**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | ✓ | |   _**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | ✓ | |   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | ✓ | |   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | ✓ | |   _**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | ✓ | ##### Inapplicable Features Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not appear in its source. Other such modifiers include the encoding modifiers `e` and `n` [See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding). These are not seen by the scanner. The following features are not currently enabled for Ruby by its regular expressions library (Onigmo). They are not supported by the scanner. - **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_ - **Capture History**: `(?@...)`, `(?@...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_ See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues) _**Note**: Attempting to process expressions with unsupported syntax features can raise an error, or incorrectly return tokens/objects as literals._ ## Testing To run the tests simply run rake from the root directory, as 'test' is the default task. It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed. The tests use RSpec. They can also be run with the test runner that whitelists some warnings: ``` bin/test ``` You can run a specific test like so: ``` bin/test spec/scanner/properties_spec.rb ``` Note that changes to Ragel files will not be reflected when running `rspec` or `bin/test`, so you might want to run: ``` rake ragel:rb && bin/test spec/scanner/properties_spec.rb ``` ## Building Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the Ruby scanner code. The project uses the standard rubygems package tasks, so: To build the gem, run: ``` rake build ``` To install the gem from the cloned project, run: ``` rake install ``` ## Example Projects Projects using regexp_parser. - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support. - [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior. - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) uses regexp_parser to generate examples of postal codes. - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions. ## References Documentation and books used while working on this project. #### Ruby Flavors * Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE) * Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE) #### Regular Expressions * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/) * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html) * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz) * Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075) #### Unicode * Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213) * Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt) * Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt) * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/) * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/) --- ##### Copyright _Copyright (c) 2010-2019 Ammar Ali. See LICENSE file for details._ regexp_parser-1.6.0/spec/0000755000004100000410000000000013541126476015357 5ustar www-datawww-dataregexp_parser-1.6.0/spec/parser/0000755000004100000410000000000013541126476016653 5ustar www-datawww-dataregexp_parser-1.6.0/spec/parser/escapes_spec.rb0000644000004100000410000001163213541126476021640 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('EscapeSequence parsing') do include_examples 'parse', /a\ac/, 1 => [:escape, :bell, EscapeSequence::Bell] include_examples 'parse', /a\ec/, 1 => [:escape, :escape, EscapeSequence::AsciiEscape] include_examples 'parse', /a\fc/, 1 => [:escape, :form_feed, EscapeSequence::FormFeed] include_examples 'parse', /a\nc/, 1 => [:escape, :newline, EscapeSequence::Newline] include_examples 'parse', /a\rc/, 1 => [:escape, :carriage, EscapeSequence::Return] include_examples 'parse', /a\tc/, 1 => [:escape, :tab, EscapeSequence::Tab] include_examples 'parse', /a\vc/, 1 => [:escape, :vertical_tab, EscapeSequence::VerticalTab] # meta character escapes include_examples 'parse', /a\.c/, 1 => [:escape, :dot, EscapeSequence::Literal] include_examples 'parse', /a\?c/, 1 => [:escape, :zero_or_one, EscapeSequence::Literal] include_examples 'parse', /a\*c/, 1 => [:escape, :zero_or_more, EscapeSequence::Literal] include_examples 'parse', /a\+c/, 1 => [:escape, :one_or_more, EscapeSequence::Literal] include_examples 'parse', /a\|c/, 1 => [:escape, :alternation, EscapeSequence::Literal] include_examples 'parse', /a\(c/, 1 => [:escape, :group_open, EscapeSequence::Literal] include_examples 'parse', /a\)c/, 1 => [:escape, :group_close, EscapeSequence::Literal] include_examples 'parse', /a\{c/, 1 => [:escape, :interval_open, EscapeSequence::Literal] include_examples 'parse', /a\}c/, 1 => [:escape, :interval_close, EscapeSequence::Literal] # unicode escapes include_examples 'parse', /a\u0640/, 1 => [:escape, :codepoint, EscapeSequence::Codepoint] include_examples 'parse', /a\u{41 1F60D}/, 1 => [:escape, :codepoint_list, EscapeSequence::CodepointList] include_examples 'parse', /a\u{10FFFF}/, 1 => [:escape, :codepoint_list, EscapeSequence::CodepointList] # hex escapes include_examples 'parse', /a\xFF/n, 1 => [:escape, :hex, EscapeSequence::Hex] # octal escapes include_examples 'parse', /a\177/n, 1 => [:escape, :octal, EscapeSequence::Octal] specify('parse chars and codepoints') do root = RP.parse(/\n\?\101\x42\u0043\u{44 45}/) expect(root[0].char).to eq "\n" expect(root[0].codepoint).to eq 10 expect(root[1].char).to eq '?' expect(root[1].codepoint).to eq 63 expect(root[2].char).to eq 'A' expect(root[2].codepoint).to eq 65 expect(root[3].char).to eq 'B' expect(root[3].codepoint).to eq 66 expect(root[4].char).to eq 'C' expect(root[4].codepoint).to eq 67 expect(root[5].chars).to eq %w[D E] expect(root[5].codepoints).to eq [68, 69] expect { root[5].char }.to raise_error(/#chars/) expect { root[5].codepoint }.to raise_error(/#codepoints/) end specify('parse escape control sequence lower') do root = RP.parse(/a\\\c2b/) expect(root[2]).to be_instance_of(EscapeSequence::Control) expect(root[2].text).to eq '\\c2' expect(root[2].char).to eq "\x12" expect(root[2].codepoint).to eq 18 end specify('parse escape control sequence upper') do root = RP.parse(/\d\\\C-C\w/) expect(root[2]).to be_instance_of(EscapeSequence::Control) expect(root[2].text).to eq '\\C-C' expect(root[2].char).to eq "\x03" expect(root[2].codepoint).to eq 3 end specify('parse escape meta sequence') do root = RP.parse(/\Z\\\M-Z/n) expect(root[2]).to be_instance_of(EscapeSequence::Meta) expect(root[2].text).to eq '\\M-Z' expect(root[2].char).to eq "\u00DA" expect(root[2].codepoint).to eq 218 end specify('parse escape meta control sequence') do root = RP.parse(/\A\\\M-\C-X/n) expect(root[2]).to be_instance_of(EscapeSequence::MetaControl) expect(root[2].text).to eq '\\M-\\C-X' expect(root[2].char).to eq "\u0098" expect(root[2].codepoint).to eq 152 end specify('parse lower c meta control sequence') do root = RP.parse(/\A\\\M-\cX/n) expect(root[2]).to be_instance_of(EscapeSequence::MetaControl) expect(root[2].text).to eq '\\M-\\cX' expect(root[2].char).to eq "\u0098" expect(root[2].codepoint).to eq 152 end specify('parse escape reverse meta control sequence') do root = RP.parse(/\A\\\C-\M-X/n) expect(root[2]).to be_instance_of(EscapeSequence::MetaControl) expect(root[2].text).to eq '\\C-\\M-X' expect(root[2].char).to eq "\u0098" expect(root[2].codepoint).to eq 152 end specify('parse escape reverse lower c meta control sequence') do root = RP.parse(/\A\\\c\M-X/n) expect(root[2]).to be_instance_of(EscapeSequence::MetaControl) expect(root[2].text).to eq '\\c\\M-X' expect(root[2].char).to eq "\u0098" expect(root[2].codepoint).to eq 152 end end regexp_parser-1.6.0/spec/parser/conditionals_spec.rb0000644000004100000410000001166113541126476022705 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Conditional parsing') do specify('parse conditional') do regexp = /(?a)(?()T|F)/ root = RP.parse(regexp, 'ruby/2.0') exp = root[1] expect(exp).to be_a(Conditional::Expression) expect(exp.type).to eq :conditional expect(exp.token).to eq :open expect(exp.to_s).to eq '(?()T|F)' expect(exp.reference).to eq 'A' end specify('parse conditional condition') do regexp = /(?a)(?()T|F)/ root = RP.parse(regexp, 'ruby/2.0') exp = root[1].condition expect(exp).to be_a(Conditional::Condition) expect(exp.type).to eq :conditional expect(exp.token).to eq :condition expect(exp.to_s).to eq '()' expect(exp.reference).to eq 'A' expect(exp.referenced_expression.to_s).to eq '(?a)' end specify('parse conditional condition with number ref') do regexp = /(a)(?(1)T|F)/ root = RP.parse(regexp, 'ruby/2.0') exp = root[1].condition expect(exp).to be_a(Conditional::Condition) expect(exp.type).to eq :conditional expect(exp.token).to eq :condition expect(exp.to_s).to eq '(1)' expect(exp.reference).to eq 1 expect(exp.referenced_expression.to_s).to eq '(a)' end specify('parse conditional nested groups') do regexp = /((a)|(b)|((?(2)(c(d|e)+)?|(?(3)f|(?(4)(g|(h)(i)))))))/ root = RP.parse(regexp, 'ruby/2.0') expect(root.to_s).to eq regexp.source group = root.first expect(group).to be_instance_of(Group::Capture) alt = group.first expect(alt).to be_instance_of(Alternation) expect(alt.length).to eq 3 expect(alt.map(&:first)).to all(be_a Group::Capture) subgroup = alt[2].first conditional = subgroup.first expect(conditional).to be_instance_of(Conditional::Expression) expect(conditional.length).to eq 3 expect(conditional[0]).to be_instance_of(Conditional::Condition) expect(conditional[0].to_s).to eq '(2)' condition = conditional.condition expect(condition).to be_instance_of(Conditional::Condition) expect(condition.to_s).to eq '(2)' branches = conditional.branches expect(branches.length).to eq 2 expect(branches).to be_instance_of(Array) end specify('parse conditional nested') do regexp = /(a(b(c(d)(e))))(?(1)(?(2)d|(?(3)e|f))|(?(4)(?(5)g|h)))/ root = RP.parse(regexp, 'ruby/2.0') expect(root.to_s).to eq regexp.source { 1 => [2, root[1]], 2 => [2, root[1][1][0]], 3 => [2, root[1][1][0][2][0]], 4 => [1, root[1][2][0]], 5 => [2, root[1][2][0][1][0]] }.each do |index, example| branch_count, exp = example expect(exp).to be_instance_of(Conditional::Expression) expect(exp.condition.to_s).to eq "(#{index})" expect(exp.branches.length).to eq branch_count end end specify('parse conditional nested alternation') do regexp = /(a)(?(1)(b|c|d)|(e|f|g))(h)(?(2)(i|j|k)|(l|m|n))|o|p/ root = RP.parse(regexp, 'ruby/2.0') expect(root.to_s).to eq regexp.source expect(root.first).to be_instance_of(Alternation) [ [3, 'b|c|d', root[0][0][1][1][0][0]], [3, 'e|f|g', root[0][0][1][2][0][0]], [3, 'i|j|k', root[0][0][3][1][0][0]], [3, 'l|m|n', root[0][0][3][2][0][0]] ].each do |example| alt_count, alt_text, exp = example expect(exp).to be_instance_of(Alternation) expect(exp.to_s).to eq alt_text expect(exp.alternatives.length).to eq alt_count end end specify('parse conditional extra separator') do regexp = /(?a)(?()T|)/ root = RP.parse(regexp, 'ruby/2.0') branches = root[1].branches expect(branches.length).to eq 2 seq_1, seq_2 = branches [seq_1, seq_2].each do |seq| expect(seq).to be_a(Sequence) expect(seq.type).to eq :expression expect(seq.token).to eq :sequence end expect(seq_1.to_s).to eq 'T' expect(seq_2.to_s).to eq '' end specify('parse conditional quantified') do regexp = /(foo)(?(1)\d|(\w)){42}/ root = RP.parse(regexp, 'ruby/2.0') conditional = root[1] expect(conditional).to be_quantified expect(conditional.quantifier.to_s).to eq '{42}' expect(conditional.to_s).to eq '(?(1)\\d|(\\w)){42}' expect(conditional.branches.any?(&:quantified?)).to be false end specify('parse conditional branch content quantified') do regexp = /(foo)(?(1)\d{23}|(\w){42})/ root = RP.parse(regexp, 'ruby/2.0') conditional = root[1] expect(conditional).not_to be_quantified expect(conditional.branches.any?(&:quantified?)).to be false expect(conditional.branches[0][0]).to be_quantified expect(conditional.branches[0][0].quantifier.to_s).to eq '{23}' expect(conditional.branches[1][0]).to be_quantified expect(conditional.branches[1][0].quantifier.to_s).to eq '{42}' end specify('parse conditional excessive branches') do regexp = '(?a)(?()T|F|X)' expect { RP.parse(regexp, 'ruby/2.0') }.to raise_error(Conditional::TooManyBranches) end end regexp_parser-1.6.0/spec/parser/free_space_spec.rb0000644000004100000410000000574213541126476022316 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('FreeSpace parsing') do specify('parse free space spaces') do regexp = /a ? b * c + d{2,4}/x root = RP.parse(regexp) 0.upto(6) do |i| if i.odd? expect(root[i]).to be_instance_of(WhiteSpace) expect(root[i].text).to eq ' ' else expect(root[i]).to be_instance_of(Literal) expect(root[i]).to be_quantified end end end specify('parse non free space literals') do regexp = /a b c d/ root = RP.parse(regexp) expect(root.first).to be_instance_of(Literal) expect(root.first.text).to eq 'a b c d' end specify('parse free space comments') do regexp = / a ? # One letter b {2,5} # Another one [c-g] + # A set (h|i|j) | # A group klm * nop + /x root = RP.parse(regexp) alt = root.first expect(alt).to be_instance_of(Alternation) alt_1 = alt.alternatives.first expect(alt_1).to be_instance_of(Alternative) expect(alt_1.length).to eq 15 [0, 2, 4, 6, 8, 12, 14].each do |i| expect(alt_1[i]).to be_instance_of(WhiteSpace) end [3, 7, 11].each { |i| expect(alt_1[i].class).to eq Comment } alt_2 = alt.alternatives.last expect(alt_2).to be_instance_of(Alternative) expect(alt_2.length).to eq 7 [0, 2, 4, 6].each { |i| expect(alt_2[i].class).to eq WhiteSpace } expect(alt_2[1]).to be_instance_of(Comment) end specify('parse free space nested comments') do regexp = / # Group one ( abc # Comment one \d? # Optional \d )+ # Group two ( def # Comment two \s? # Optional \s )? /x root = RP.parse(regexp) top_comment_1 = root[1] expect(top_comment_1).to be_instance_of(Comment) expect(top_comment_1.text).to eq "# Group one\n" expect(top_comment_1.starts_at).to eq 7 top_comment_2 = root[5] expect(top_comment_2).to be_instance_of(Comment) expect(top_comment_2.text).to eq "# Group two\n" expect(top_comment_2.starts_at).to eq 95 [3, 7].each do |g,| group = root[g] [3, 7].each do |c| comment = group[c] expect(comment).to be_instance_of(Comment) expect(comment.text.length).to eq 14 end end end specify('parse free space quantifiers') do regexp = / a # comment 1 ? ( b # comment 2 # comment 3 + ) # comment 4 * /x root = RP.parse(regexp) literal_1 = root[1] expect(literal_1).to be_instance_of(Literal) expect(literal_1).to be_quantified expect(literal_1.quantifier.token).to eq :zero_or_one group = root[5] expect(group).to be_instance_of(Group::Capture) expect(group).to be_quantified expect(group.quantifier.token).to eq :zero_or_more literal_2 = group[1] expect(literal_2).to be_instance_of(Literal) expect(literal_2).to be_quantified expect(literal_2.quantifier.token).to eq :one_or_more end end regexp_parser-1.6.0/spec/parser/all_spec.rb0000644000004100000410000000234213541126476020763 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Parser) do specify('parse returns a root expression') do expect(RP.parse('abc')).to be_instance_of(Root) end specify('parse can be called with block') do expect(RP.parse('abc') { |root| root.class }).to eq Root end specify('parse root contains expressions') do root = RP.parse(/^a.c+[^one]{2,3}\b\d\\\C-C$/) expect(root.expressions).to all(be_a Regexp::Expression::Base) end specify('parse root options mi') do root = RP.parse(/[abc]/mi, 'ruby/1.8') expect(root.m?).to be true expect(root.i?).to be true expect(root.x?).to be false end specify('parse node types') do root = RP.parse('^(one){2,3}([^d\\]efm-qz\\,\\-]*)(ghi)+$') expect(root[1][0]).to be_a(Literal) expect(root[1]).to be_quantified expect(root[2][0]).to be_a(CharacterSet) expect(root[2]).not_to be_quantified expect(root[3]).to be_a(Group::Capture) expect(root[3]).to be_quantified end specify('parse no quantifier target raises error') do expect { RP.parse('?abc') }.to raise_error(ArgumentError) end specify('parse sequence no quantifier target raises error') do expect { RP.parse('abc|?def') }.to raise_error(ArgumentError) end end regexp_parser-1.6.0/spec/parser/keep_spec.rb0000644000004100000410000000035313541126476021137 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Keep parsing') do include_examples 'parse', /ab\Kcd/, 1 => [:keep, :mark, Keep::Mark, text: '\K'] include_examples 'parse', /(a\K)/, [0, 1] => [:keep, :mark, Keep::Mark, text: '\K'] end regexp_parser-1.6.0/spec/parser/sets_spec.rb0000644000004100000410000001106613541126476021174 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('CharacterSet parsing') do specify('parse set basic') do root = RP.parse('[ab]+') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 2 expect(exp[0]).to be_instance_of(Literal) expect(exp[0].text).to eq 'a' expect(exp[1]).to be_instance_of(Literal) expect(exp[1].text).to eq 'b' expect(exp).to be_quantified expect(exp.quantifier.min).to eq 1 expect(exp.quantifier.max).to eq(-1) end specify('parse set char type') do root = RP.parse('[a\\dc]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 3 expect(exp[1]).to be_instance_of(CharacterType::Digit) expect(exp[1].text).to eq '\\d' end specify('parse set escape sequence backspace') do root = RP.parse('[a\\bc]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 3 expect(exp[1]).to be_instance_of(EscapeSequence::Backspace) expect(exp[1].text).to eq '\\b' expect(exp).to match 'a' expect(exp).to match "\b" expect(exp).not_to match 'b' expect(exp).to match 'c' end specify('parse set escape sequence hex') do root = RP.parse('[a\\x20c]', :any) exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 3 expect(exp[1]).to be_instance_of(EscapeSequence::Hex) expect(exp[1].text).to eq '\\x20' end specify('parse set escape sequence codepoint') do root = RP.parse('[a\\u0640]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 2 expect(exp[1]).to be_instance_of(EscapeSequence::Codepoint) expect(exp[1].text).to eq '\\u0640' end specify('parse set escape sequence codepoint list') do root = RP.parse('[a\\u{41 1F60D}]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 2 expect(exp[1]).to be_instance_of(EscapeSequence::CodepointList) expect(exp[1].text).to eq '\\u{41 1F60D}' end specify('parse set posix class') do root = RP.parse('[[:digit:][:^lower:]]+') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 2 expect(exp[0]).to be_instance_of(PosixClass) expect(exp[0].text).to eq '[:digit:]' expect(exp[1]).to be_instance_of(PosixClass) expect(exp[1].text).to eq '[:^lower:]' end specify('parse set nesting') do root = RP.parse('[a[b[c]d]e]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 3 expect(exp[0]).to be_instance_of(Literal) expect(exp[2]).to be_instance_of(Literal) subset1 = exp[1] expect(subset1).to be_instance_of(CharacterSet) expect(subset1.count).to eq 3 expect(subset1[0]).to be_instance_of(Literal) expect(subset1[2]).to be_instance_of(Literal) subset2 = subset1[1] expect(subset2).to be_instance_of(CharacterSet) expect(subset2.count).to eq 1 expect(subset2[0]).to be_instance_of(Literal) end specify('parse set nesting negative') do root = RP.parse('[a[^b[c]]]') exp = root[0] expect(exp).to be_instance_of(CharacterSet) expect(exp.count).to eq 2 expect(exp[0]).to be_instance_of(Literal) expect(exp).not_to be_negative subset1 = exp[1] expect(subset1).to be_instance_of(CharacterSet) expect(subset1.count).to eq 2 expect(subset1[0]).to be_instance_of(Literal) expect(subset1).to be_negative subset2 = subset1[1] expect(subset2).to be_instance_of(CharacterSet) expect(subset2.count).to eq 1 expect(subset2[0]).to be_instance_of(Literal) expect(subset2).not_to be_negative end specify('parse set nesting #to_s') do pattern = '[a[b[^c]]]' root = RP.parse(pattern) expect(root.to_s).to eq pattern end specify('parse set literals are not merged') do root = RP.parse("[#{('a' * 10)}]") exp = root[0] expect(exp.count).to eq 10 end specify('parse set whitespace is not merged') do root = RP.parse("[#{(' ' * 10)}]") exp = root[0] expect(exp.count).to eq 10 end specify('parse set whitespace is not merged in x mode') do root = RP.parse("(?x)[#{(' ' * 10)}]") exp = root[1] expect(exp.count).to eq 10 end specify('parse set collating sequence') do root = RP.parse('[a[.span-ll.]h]', :any) exp = root[0] expect(exp[1].to_s).to eq '[.span-ll.]' end specify('parse set character equivalents') do root = RP.parse('[a[=e=]h]', :any) exp = root[0] expect(exp[1].to_s).to eq '[=e=]' end end regexp_parser-1.6.0/spec/parser/quantifiers_spec.rb0000644000004100000410000000526013541126476022547 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Quantifier parsing') do RSpec.shared_examples 'quantifier' do |pattern, text, mode, token, min, max| it "parses the quantifier in #{pattern} as #{mode} #{token}" do root = RP.parse(pattern, '*') exp = root[0] expect(exp).to be_quantified expect(exp.quantifier.token).to eq token expect(exp.quantifier.min).to eq min expect(exp.quantifier.max).to eq max expect(exp.quantifier.mode).to eq mode end end include_examples 'quantifier', /a?b/, '?', :greedy, :zero_or_one, 0, 1 include_examples 'quantifier', /a??b/, '??', :reluctant, :zero_or_one, 0, 1 include_examples 'quantifier', /a?+b/, '?+', :possessive, :zero_or_one, 0, 1 include_examples 'quantifier', /a*b/, '*', :greedy, :zero_or_more, 0, -1 include_examples 'quantifier', /a*?b/, '*?', :reluctant, :zero_or_more, 0, -1 include_examples 'quantifier', /a*+b/, '*+', :possessive, :zero_or_more, 0, -1 include_examples 'quantifier', /a+b/, '+', :greedy, :one_or_more, 1, -1 include_examples 'quantifier', /a+?b/, '+?', :reluctant, :one_or_more, 1, -1 include_examples 'quantifier', /a++b/, '++', :possessive, :one_or_more, 1, -1 include_examples 'quantifier', /a{2,4}b/, '{2,4}', :greedy, :interval, 2, 4 include_examples 'quantifier', /a{2,4}?b/, '{2,4}?', :reluctant, :interval, 2, 4 include_examples 'quantifier', /a{2,4}+b/, '{2,4}+', :possessive, :interval, 2, 4 include_examples 'quantifier', /a{2,}b/, '{2,}', :greedy, :interval, 2, -1 include_examples 'quantifier', /a{2,}?b/, '{2,}?', :reluctant, :interval, 2, -1 include_examples 'quantifier', /a{2,}+b/, '{2,}+', :possessive, :interval, 2, -1 include_examples 'quantifier', /a{,3}b/, '{,3}', :greedy, :interval, 0, 3 include_examples 'quantifier', /a{,3}?b/, '{,3}?', :reluctant, :interval, 0, 3 include_examples 'quantifier', /a{,3}+b/, '{,3}+', :possessive, :interval, 0, 3 include_examples 'quantifier', /a{4}b/, '{4}', :greedy, :interval, 4, 4 include_examples 'quantifier', /a{4}?b/, '{4}?', :reluctant, :interval, 4, 4 include_examples 'quantifier', /a{4}+b/, '{4}+', :possessive, :interval, 4, 4 specify('mode-checking methods') do exp = RP.parse(/a??/).first expect(exp).to be_reluctant expect(exp).to be_lazy expect(exp).not_to be_greedy expect(exp).not_to be_possessive expect(exp.quantifier).to be_reluctant expect(exp.quantifier).to be_lazy expect(exp.quantifier).not_to be_greedy expect(exp.quantifier).not_to be_possessive end end regexp_parser-1.6.0/spec/parser/alternation_spec.rb0000644000004100000410000000402713541126476022535 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Alternation parsing') do let(:root) { RP.parse('(ab??|cd*|ef+)*|(gh|ij|kl)?') } specify('parse alternation root') do e = root[0] expect(e).to be_a(Alternation) end specify('parse alternation alts') do alts = root[0].alternatives expect(alts[0]).to be_a(Alternative) expect(alts[1]).to be_a(Alternative) expect(alts[0][0]).to be_a(Group::Capture) expect(alts[1][0]).to be_a(Group::Capture) expect(alts.length).to eq 2 end specify('parse alternation nested') do e = root[0].alternatives[0][0][0] expect(e).to be_a(Alternation) end specify('parse alternation nested sequence') do alts = root[0][0] nested = alts[0][0][0] expect(nested).to be_a(Alternative) expect(nested[0]).to be_a(Literal) expect(nested[1]).to be_a(Literal) expect(nested.expressions.length).to eq 2 end specify('parse alternation nested groups') do root = RP.parse('(i|ey|([ougfd]+)|(ney))') alts = root[0][0].alternatives expect(alts.length).to eq 4 end specify('parse alternation grouped alts') do root = RP.parse('ca((n)|(t)|(ll)|(b))') alts = root[1][0].alternatives expect(alts.length).to eq 4 expect(alts[0]).to be_a(Alternative) expect(alts[1]).to be_a(Alternative) expect(alts[2]).to be_a(Alternative) expect(alts[3]).to be_a(Alternative) end specify('parse alternation nested grouped alts') do root = RP.parse('ca((n|t)|(ll|b))') alts = root[1][0].alternatives expect(alts.length).to eq 2 expect(alts[0]).to be_a(Alternative) expect(alts[1]).to be_a(Alternative) subalts = root[1][0][0][0][0].alternatives expect(alts.length).to eq 2 expect(subalts[0]).to be_a(Alternative) expect(subalts[1]).to be_a(Alternative) end specify('parse alternation continues after nesting') do root = RP.parse(/a|(b)c/) seq = root[0][1].expressions expect(seq.length).to eq 2 expect(seq[0]).to be_a(Group::Capture) expect(seq[1]).to be_a(Literal) end end regexp_parser-1.6.0/spec/parser/refcalls_spec.rb0000644000004100000410000001111513541126476022004 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Refcall parsing') do include_examples 'parse', /(abc)\1/, 1 => [:backref, :number, Backreference::Number, number: 1] include_examples 'parse', /(?abc)\k/, 1 => [:backref, :name_ref, Backreference::Name, name: 'X'] include_examples 'parse', /(?abc)\k'X'/, 1 => [:backref, :name_ref, Backreference::Name, name: 'X'] include_examples 'parse', /(abc)\k<1>/, 1 => [:backref, :number_ref, Backreference::Number, number: 1] include_examples 'parse', /(abc)\k'1'/, 1 => [:backref, :number_ref, Backreference::Number, number: 1] include_examples 'parse', /(abc)\k<-1>/, 1 => [:backref, :number_rel_ref, Backreference::NumberRelative, number: -1] include_examples 'parse', /(abc)\k'-1'/, 1 => [:backref, :number_rel_ref, Backreference::NumberRelative, number: -1] include_examples 'parse', /(?abc)\g/, 1 => [:backref, :name_call, Backreference::NameCall, name: 'X'] include_examples 'parse', /(?abc)\g'X'/, 1 => [:backref, :name_call, Backreference::NameCall, name: 'X'] include_examples 'parse', /(abc)\g<1>/, 1 => [:backref, :number_call, Backreference::NumberCall, number: 1] include_examples 'parse', /(abc)\g'1'/, 1 => [:backref, :number_call, Backreference::NumberCall, number: 1] include_examples 'parse', /(abc)\g<-1>/, 1 => [:backref, :number_rel_call, Backreference::NumberCallRelative, number: -1] include_examples 'parse', /(abc)\g'-1'/, 1 => [:backref, :number_rel_call, Backreference::NumberCallRelative, number: -1] include_examples 'parse', /\g<+1>(abc)/, 0 => [:backref, :number_rel_call, Backreference::NumberCallRelative, number: 1] include_examples 'parse', /\g'+1'(abc)/, 0 => [:backref, :number_rel_call, Backreference::NumberCallRelative, number: 1] include_examples 'parse', /(?abc)\k/, 1 => [:backref, :name_recursion_ref, Backreference::NameRecursionLevel, name: 'X', recursion_level: 0] include_examples 'parse', /(?abc)\k'X-0'/, 1 => [:backref, :name_recursion_ref, Backreference::NameRecursionLevel, name: 'X', recursion_level: 0] include_examples 'parse', /(abc)\k<1-0>/, 1 => [:backref, :number_recursion_ref, Backreference::NumberRecursionLevel, number: 1, recursion_level: 0] include_examples 'parse', /(abc)\k'1-0'/, 1 => [:backref, :number_recursion_ref, Backreference::NumberRecursionLevel, number: 1, recursion_level: 0] include_examples 'parse', /(abc)\k'-1+0'/, 1 => [:backref, :number_recursion_ref, Backreference::NumberRecursionLevel, number: -1, recursion_level: 0] include_examples 'parse', /(abc)\k'1+1'/, 1 => [:backref, :number_recursion_ref, Backreference::NumberRecursionLevel, number: 1, recursion_level: 1] include_examples 'parse', /(abc)\k'1-1'/, 1 => [:backref, :number_recursion_ref, Backreference::NumberRecursionLevel, number: 1, recursion_level: -1] specify('parse backref effective_number') do root = RP.parse('(abc)(def)\\k<-1>(ghi)\\k<-3>\\k<-1>', 'ruby/1.9') exp1 = root[2] exp2 = root[4] exp3 = root[5] expect([exp1, exp2, exp3]).to all be_instance_of(Backreference::NumberRelative) expect(exp1.effective_number).to eq 2 expect(exp2.effective_number).to eq 1 expect(exp3.effective_number).to eq 3 end specify('parse backref referenced_expression') do root = RP.parse('(abc)(def)\\k<-1>(ghi)\\k<-3>\\k<-1>', 'ruby/1.9') exp1 = root[2] exp2 = root[4] exp3 = root[5] expect([exp1, exp2, exp3]).to all be_instance_of(Backreference::NumberRelative) expect(exp1.referenced_expression.to_s).to eq '(def)' expect(exp2.referenced_expression.to_s).to eq '(abc)' expect(exp3.referenced_expression.to_s).to eq '(ghi)' end specify('parse backref call effective_number') do root = RP.parse('\\g<+1>(abc)\\g<+2>(def)(ghi)\\g<-2>', 'ruby/1.9') exp1 = root[0] exp2 = root[2] exp3 = root[5] expect([exp1, exp2, exp3]).to all be_instance_of(Backreference::NumberCallRelative) expect(exp1.effective_number).to eq 1 expect(exp2.effective_number).to eq 3 expect(exp3.effective_number).to eq 2 end specify('parse backref call referenced_expression') do root = RP.parse('\\g<+1>(abc)\\g<+2>(def)(ghi)\\g<-2>', 'ruby/1.9') exp1 = root[0] exp2 = root[2] exp3 = root[5] expect([exp1, exp2, exp3]).to all be_instance_of(Backreference::NumberCallRelative) expect(exp1.referenced_expression.to_s).to eq '(abc)' expect(exp2.referenced_expression.to_s).to eq '(ghi)' expect(exp3.referenced_expression.to_s).to eq '(def)' end end regexp_parser-1.6.0/spec/parser/errors_spec.rb0000644000004100000410000000221213541126476021523 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Parsing errors') do let(:parser) { Regexp::Parser.new } before { parser.parse(/foo/) } # initializes ivars it('raises UnknownTokenTypeError for unknown token types') do expect { parser.send(:parse_token, Regexp::Token.new(:foo, :bar)) } .to raise_error(Regexp::Parser::UnknownTokenTypeError) end RSpec.shared_examples 'UnknownTokenError' do |type, token| it "raises for unkown tokens of type #{type}" do expect { parser.send(:parse_token, Regexp::Token.new(type, :foo)) } .to raise_error(Regexp::Parser::UnknownTokenError) end end include_examples 'UnknownTokenError', :anchor include_examples 'UnknownTokenError', :backref include_examples 'UnknownTokenError', :conditional include_examples 'UnknownTokenError', :free_space include_examples 'UnknownTokenError', :group include_examples 'UnknownTokenError', :meta include_examples 'UnknownTokenError', :nonproperty include_examples 'UnknownTokenError', :property include_examples 'UnknownTokenError', :quantifier include_examples 'UnknownTokenError', :set include_examples 'UnknownTokenError', :type end regexp_parser-1.6.0/spec/parser/types_spec.rb0000644000004100000410000000171713541126476021364 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('CharacterType parsing') do include_examples 'parse', /a\dc/, 1 => [:type, :digit, CharacterType::Digit] include_examples 'parse', /a\Dc/, 1 => [:type, :nondigit, CharacterType::NonDigit] include_examples 'parse', /a\sc/, 1 => [:type, :space, CharacterType::Space] include_examples 'parse', /a\Sc/, 1 => [:type, :nonspace, CharacterType::NonSpace] include_examples 'parse', /a\hc/, 1 => [:type, :hex, CharacterType::Hex] include_examples 'parse', /a\Hc/, 1 => [:type, :nonhex, CharacterType::NonHex] include_examples 'parse', /a\wc/, 1 => [:type, :word, CharacterType::Word] include_examples 'parse', /a\Wc/, 1 => [:type, :nonword, CharacterType::NonWord] include_examples 'parse', 'a\\Rc', 1 => [:type, :linebreak, CharacterType::Linebreak] include_examples 'parse', 'a\\Xc', 1 => [:type, :xgrapheme, CharacterType::ExtendedGrapheme] end regexp_parser-1.6.0/spec/parser/properties_spec.rb0000644000004100000410000000615313541126476022413 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Property parsing') do example_props = [ 'Alnum', 'Any', 'Age=1.1', 'Dash', 'di', 'Default_Ignorable_Code_Point', 'Math', 'Noncharacter-Code_Point', # test dash 'sd', 'Soft Dotted', # test whitespace 'sterm', 'xidc', 'XID_Continue', 'Emoji', 'InChessSymbols' ] example_props.each do |name| it("parses property #{name}") do exp = RP.parse("ab\\p{#{name}}", '*').last expect(exp).to be_a(UnicodeProperty::Base) expect(exp.type).to eq :property expect(exp.name).to eq name end it("parses nonproperty #{name}") do exp = RP.parse("ab\\P{#{name}}", '*').last expect(exp).to be_a(UnicodeProperty::Base) expect(exp.type).to eq :nonproperty expect(exp.name).to eq name end end specify('parse all properties of current ruby') do unsupported = RegexpPropertyValues.all_for_current_ruby.reject do |prop| RP.parse("\\p{#{prop}}") rescue false end expect(unsupported).to be_empty end specify('parse property negative') do root = RP.parse('ab\p{L}cd', 'ruby/1.9') expect(root[1]).not_to be_negative end specify('parse nonproperty negative') do root = RP.parse('ab\P{L}cd', 'ruby/1.9') expect(root[1]).to be_negative end specify('parse caret nonproperty negative') do root = RP.parse('ab\p{^L}cd', 'ruby/1.9') expect(root[1]).to be_negative end specify('parse double negated property negative') do root = RP.parse('ab\P{^L}cd', 'ruby/1.9') expect(root[1]).not_to be_negative end specify('parse property shortcut') do expect(RP.parse('\p{lowercase_letter}')[0].shortcut).to eq 'll' expect(RP.parse('\p{sc}')[0].shortcut).to eq 'sc' expect(RP.parse('\p{in_bengali}')[0].shortcut).to be_nil end specify('parse property age') do root = RP.parse('ab\p{age=5.2}cd', 'ruby/1.9') expect(root[1]).to be_a(UnicodeProperty::Age) end specify('parse property derived') do root = RP.parse('ab\p{Math}cd', 'ruby/1.9') expect(root[1]).to be_a(UnicodeProperty::Derived) end specify('parse property script') do root = RP.parse('ab\p{Hiragana}cd', 'ruby/1.9') expect(root[1]).to be_a(UnicodeProperty::Script) end specify('parse property script V1 9 3') do root = RP.parse('ab\p{Brahmi}cd', 'ruby/1.9.3') expect(root[1]).to be_a(UnicodeProperty::Script) end specify('parse property script V2 2 0') do root = RP.parse('ab\p{Caucasian_Albanian}cd', 'ruby/2.2') expect(root[1]).to be_a(UnicodeProperty::Script) end specify('parse property block') do root = RP.parse('ab\p{InArmenian}cd', 'ruby/1.9') expect(root[1]).to be_a(UnicodeProperty::Block) end specify('parse property following literal') do root = RP.parse('ab\p{Lu}cd', 'ruby/1.9') expect(root[2]).to be_a(Literal) end specify('parse abandoned newline property') do root = RP.parse('\p{newline}', 'ruby/1.9') expect(root.expressions.last).to be_a(UnicodeProperty::Base) expect { RP.parse('\p{newline}', 'ruby/2.0') } .to raise_error(Regexp::Syntax::NotImplementedError) end end regexp_parser-1.6.0/spec/parser/posix_classes_spec.rb0000644000004100000410000000065713541126476023101 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('PosixClass parsing') do include_examples 'parse', /[[:word:]]/, [0, 0] => [:posixclass, :word, PosixClass, name: 'word', text: '[:word:]', negative?: false] include_examples 'parse', /[[:^word:]]/, [0, 0] => [:nonposixclass, :word, PosixClass, name: 'word', text: '[:^word:]', negative?: true] end regexp_parser-1.6.0/spec/parser/groups_spec.rb0000644000004100000410000001236513541126476021540 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Group parsing') do include_examples 'parse', /(?=abc)(?!def)/, 0 => [:assertion, :lookahead, Assertion::Lookahead], 1 => [:assertion, :nlookahead, Assertion::NegativeLookahead] include_examples 'parse', /(?<=abc)(? [:assertion, :lookbehind, Assertion::Lookbehind], 1 => [:assertion, :nlookbehind, Assertion::NegativeLookbehind] include_examples 'parse', /a(?# is for apple)b(?# for boy)c(?# cat)/, 1 => [:group, :comment, Group::Comment], 3 => [:group, :comment, Group::Comment], 5 => [:group, :comment, Group::Comment] if ruby_version_at_least('2.4.1') include_examples 'parse', 'a(?~b)c(?~d)e', 1 => [:group, :absence, Group::Absence], 3 => [:group, :absence, Group::Absence] end include_examples 'parse', /(?m:a)/, 0 => [:group, :options, Group::Options, options: { m: true }, option_changes: { m: true }] # self-defeating group option include_examples 'parse', /(?m-m:a)/, 0 => [:group, :options, Group::Options, options: {}, option_changes: { m: false }] # activate one option in nested group include_examples 'parse', /(?x-mi:a(?m:b))/, 0 => [:group, :options, Group::Options, options: { x: true }, option_changes: { i: false, m: false, x: true }], [0, 1] => [:group, :options, Group::Options, options: { m: true, x: true }, option_changes: { m: true }] # deactivate one option in nested group include_examples 'parse', /(?ix-m:a(?-i:b))/, 0 => [:group, :options, Group::Options, options: { i: true, x: true }, option_changes: { i: true, m: false, x: true }], [0, 1] => [:group, :options, Group::Options, options: { x: true }, option_changes: { i: false }] # invert all options in nested group include_examples 'parse', /(?xi-m:a(?m-ix:b))/, 0 => [:group, :options, Group::Options, options: { i: true, x: true }, option_changes: { i: true, m: false, x: true }], [0, 1] => [:group, :options, Group::Options, options: { m: true }, option_changes: { i: false, m: true, x: false }] # nested options affect literal subexpressions include_examples 'parse', /(?x-mi:a(?m:b))/, [0, 0] => [:literal, :literal, Literal, text: 'a', options: { x: true }], [0, 1, 0] => [:literal, :literal, Literal, text: 'b', options: { m: true, x: true }] # option switching group include_examples 'parse', /a(?i-m)b/m, 0 => [:literal, :literal, Literal, text: 'a', options: { m: true }], 1 => [:group, :options_switch, Group::Options, options: { i: true }, option_changes: { i: true, m: false }], 2 => [:literal, :literal, Literal, text: 'b', options: { i: true }] # option switch in group include_examples 'parse', /(a(?i-m)b)c/m, 0 => [:group, :capture, Group::Capture, options: { m: true }], [0, 0] => [:literal, :literal, Literal, text: 'a', options: { m: true }], [0, 1] => [:group, :options_switch, Group::Options, options: { i: true }, option_changes: { i: true, m: false }], [0, 2] => [:literal, :literal, Literal, text: 'b', options: { i: true }], 1 => [:literal, :literal, Literal, text: 'c', options: { m: true }] # nested option switch in group include_examples 'parse', /((?i-m)(a(?-i)b))/m, [0, 1] => [:group, :capture, Group::Capture, options: { i: true }], [0, 1, 0] => [:literal, :literal, Literal, text: 'a', options: { i: true }], [0, 1, 1] => [:group, :options_switch, Group::Options, options: {}, option_changes: { i: false }], [0, 1, 2] => [:literal, :literal, Literal, text: 'b', options: {}] # options dau include_examples 'parse', /(?dua:abc)/, 0 => [:group, :options, Group::Options, options: { a: true }, option_changes: { a: true }] # nested options dau include_examples 'parse', /(?u:a(?d:b))/, 0 => [:group, :options, Group::Options, options: { u: true }, option_changes: { u: true }], [0, 1] => [:group, :options, Group::Options, options: { d: true }, option_changes: { d: true, u: false }], [0, 1, 0] => [:literal, :literal, Literal, text: 'b', options: { d: true }] # nested options da include_examples 'parse', /(?di-xm:a(?da-x:b))/, 0 => [:group, :options, Group::Options, options: { d: true, i:true }], [0, 1] => [:group, :options, Group::Options, options: { a: true, i: true }, option_changes: { a: true, d: false, x: false}], [0, 1, 0] => [:literal, :literal, Literal, text: 'b', options: { a: true, i: true }] specify('parse group number') do root = RP.parse(/(a)(?=b)((?:c)(d|(e)))/) expect(root[0].number).to eq 1 expect(root[1]).not_to respond_to(:number) expect(root[2].number).to eq 2 expect(root[2][0]).not_to respond_to(:number) expect(root[2][1].number).to eq 3 expect(root[2][1][0][1][0].number).to eq 4 end specify('parse group number at level') do root = RP.parse(/(a)(?=b)((?:c)(d|(e)))/) expect(root[0].number_at_level).to eq 1 expect(root[1]).not_to respond_to(:number_at_level) expect(root[2].number_at_level).to eq 2 expect(root[2][0]).not_to respond_to(:number_at_level) expect(root[2][1].number_at_level).to eq 1 expect(root[2][1][0][1][0].number_at_level).to eq 1 end end regexp_parser-1.6.0/spec/parser/anchors_spec.rb0000644000004100000410000000157713541126476021661 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Anchor parsing') do include_examples 'parse', /^a/, 0 => [:anchor, :bol, Anchor::BOL] include_examples 'parse', /a$/, 1 => [:anchor, :eol, Anchor::EOL] include_examples 'parse', /\Aa/, 0 => [:anchor, :bos, Anchor::BOS] include_examples 'parse', /a\z/, 1 => [:anchor, :eos, Anchor::EOS] include_examples 'parse', /a\Z/, 1 => [:anchor, :eos_ob_eol, Anchor::EOSobEOL] include_examples 'parse', /a\b/, 1 => [:anchor, :word_boundary, Anchor::WordBoundary] include_examples 'parse', /a\B/, 1 => [:anchor, :nonword_boundary, Anchor::NonWordBoundary] include_examples 'parse', /a\G/, 1 => [:anchor, :match_start, Anchor::MatchStart] include_examples 'parse', /\\A/, 0 => [:escape, :backslash, EscapeSequence::Literal] end regexp_parser-1.6.0/spec/parser/set/0000755000004100000410000000000013541126476017446 5ustar www-datawww-dataregexp_parser-1.6.0/spec/parser/set/intersections_spec.rb0000644000004100000410000001000613541126476023673 0ustar www-datawww-datarequire 'spec_helper' # edge cases with `...-&&...` and `...&&-...` are checked in test_ranges.rb RSpec.describe('CharacterSet::Intersection parsing') do specify('parse set intersection') do root = RP.parse('[a&&z]') set = root[0] ints = set[0] expect(set.count).to eq 1 expect(ints).to be_instance_of(CharacterSet::Intersection) expect(ints.count).to eq 2 seq1, seq2 = ints.expressions expect(seq1).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq1.count).to eq 1 expect(seq1.first.to_s).to eq 'a' expect(seq1.first).to be_instance_of(Literal) expect(seq2).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq2.count).to eq 1 expect(seq2.first.to_s).to eq 'z' expect(seq2.first).to be_instance_of(Literal) expect(set).not_to match 'a' expect(set).not_to match '&' expect(set).not_to match 'z' end specify('parse set intersection range and subset') do root = RP.parse('[a-z&&[^a]]') set = root[0] ints = set[0] expect(set.count).to eq 1 expect(ints).to be_instance_of(CharacterSet::Intersection) expect(ints.count).to eq 2 seq1, seq2 = ints.expressions expect(seq1).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq1.count).to eq 1 expect(seq1.first.to_s).to eq 'a-z' expect(seq1.first).to be_instance_of(CharacterSet::Range) expect(seq2).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq2.count).to eq 1 expect(seq2.first.to_s).to eq '[^a]' expect(seq2.first).to be_instance_of(CharacterSet) expect(set).not_to match 'a' expect(set).not_to match '&' expect(set).to match 'b' end specify('parse set intersection trailing range') do root = RP.parse('[a&&a-z]') set = root[0] ints = set[0] expect(set.count).to eq 1 expect(ints).to be_instance_of(CharacterSet::Intersection) expect(ints.count).to eq 2 seq1, seq2 = ints.expressions expect(seq1).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq1.count).to eq 1 expect(seq1.first.to_s).to eq 'a' expect(seq1.first).to be_instance_of(Literal) expect(seq2).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq2.count).to eq 1 expect(seq2.first.to_s).to eq 'a-z' expect(seq2.first).to be_instance_of(CharacterSet::Range) expect(set).to match 'a' expect(set).not_to match '&' expect(set).not_to match 'b' end specify('parse set intersection type') do root = RP.parse('[a&&\\w]') set = root[0] ints = set[0] expect(set.count).to eq 1 expect(ints).to be_instance_of(CharacterSet::Intersection) expect(ints.count).to eq 2 seq1, seq2 = ints.expressions expect(seq1).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq1.count).to eq 1 expect(seq1.first.to_s).to eq 'a' expect(seq1.first).to be_instance_of(Literal) expect(seq2).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq2.count).to eq 1 expect(seq2.first.to_s).to eq '\\w' expect(seq2.first).to be_instance_of(CharacterType::Word) expect(set).to match 'a' expect(set).not_to match '&' expect(set).not_to match 'b' end specify('parse set intersection multipart') do root = RP.parse('[\\h&&\\w&&efg]') set = root[0] ints = set[0] expect(set.count).to eq 1 expect(ints).to be_instance_of(CharacterSet::Intersection) expect(ints.count).to eq 3 seq1, seq2, seq3 = ints.expressions expect(seq1).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq1.count).to eq 1 expect(seq1.first.to_s).to eq '\\h' expect(seq2).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq2.count).to eq 1 expect(seq2.first.to_s).to eq '\\w' expect(seq3).to be_instance_of(CharacterSet::IntersectedSequence) expect(seq3.count).to eq 3 expect(seq3.to_s).to eq 'efg' expect(set).to match 'e' expect(set).to match 'f' expect(set).not_to match 'a' expect(set).not_to match 'g' end end regexp_parser-1.6.0/spec/parser/set/ranges_spec.rb0000644000004100000410000000574213541126476022274 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('CharacterSet::Range parsing') do specify('parse set range') do root = RP.parse('[a-z]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range).to be_instance_of(CharacterSet::Range) expect(range.count).to eq 2 expect(range.first.to_s).to eq 'a' expect(range.first).to be_instance_of(Literal) expect(range.last.to_s).to eq 'z' expect(range.last).to be_instance_of(Literal) expect(set).to match 'm' end specify('parse set range hex') do root = RP.parse('[\\x00-\\x99]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range).to be_instance_of(CharacterSet::Range) expect(range.count).to eq 2 expect(range.first.to_s).to eq '\\x00' expect(range.first).to be_instance_of(EscapeSequence::Hex) expect(range.last.to_s).to eq '\\x99' expect(range.last).to be_instance_of(EscapeSequence::Hex) expect(set).to match '\\x50' end specify('parse set range unicode') do root = RP.parse('[\\u{40 42}-\\u1234]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range).to be_instance_of(CharacterSet::Range) expect(range.count).to eq 2 expect(range.first.to_s).to eq '\\u{40 42}' expect(range.first).to be_instance_of(EscapeSequence::CodepointList) expect(range.last.to_s).to eq '\\u1234' expect(range.last).to be_instance_of(EscapeSequence::Codepoint) expect(set).to match '\\u600' end specify('parse set range edge case leading dash') do root = RP.parse('[--z]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range.count).to eq 2 expect(set).to match 'a' end specify('parse set range edge case trailing dash') do root = RP.parse('[!--]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range.count).to eq 2 expect(set).to match '$' end specify('parse set range edge case leading negate') do root = RP.parse('[^-z]') set = root[0] expect(set.count).to eq 2 expect(set).to match 'a' expect(set).not_to match 'z' end specify('parse set range edge case trailing negate') do root = RP.parse('[!-^]') set = root[0] range = set[0] expect(set.count).to eq 1 expect(range.count).to eq 2 expect(set).to match '$' end specify('parse set range edge case leading intersection') do root = RP.parse('[[\\-ab]&&-bc]') set = root[0] expect(set.count).to eq 1 expect(set.first.last.to_s).to eq '-bc' expect(set).to match '-' expect(set).to match 'b' expect(set).not_to match 'a' expect(set).not_to match 'c' end specify('parse set range edge case trailing intersection') do root = RP.parse('[bc-&&[\\-ab]]') set = root[0] expect(set.count).to eq 1 expect(set.first.first.to_s).to eq 'bc-' expect(set).to match '-' expect(set).to match 'b' expect(set).not_to match 'a' expect(set).not_to match 'c' end end regexp_parser-1.6.0/spec/expression/0000755000004100000410000000000013541126476017556 5ustar www-datawww-dataregexp_parser-1.6.0/spec/expression/options_spec.rb0000644000004100000410000001214513541126476022613 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#options') do it 'returns a hash of options/flags that affect the expression' do exp = RP.parse(/a/ix)[0] expect(exp).to be_a Literal expect(exp.options).to eq(i: true, x: true) end it 'includes options that are locally enabled via special groups' do exp = RP.parse(/(?x)(?m:a)/i)[1][0] expect(exp).to be_a Literal expect(exp.options).to eq(i: true, m: true, x: true) end it 'excludes locally disabled options' do exp = RP.parse(/(?x)(?-im:a)/i)[1][0] expect(exp).to be_a Literal expect(exp.options).to eq(x: true) end it 'gives correct precedence to negative options' do # Negative options have precedence. E.g. /(?i-i)a/ is case-sensitive. regexp = /(?i-i:a)/ expect(regexp).to match 'a' expect(regexp).not_to match 'A' exp = RP.parse(regexp)[0][0] expect(exp).to be_a Literal expect(exp.options).to eq({}) end it 'correctly handles multiple negative option parts' do regexp = /(?--m--mx--) . /mx expect(regexp).to match ' . ' expect(regexp).not_to match '.' expect(regexp).not_to match "\n" exp = RP.parse(regexp)[2] expect(exp.options).to eq({}) end it 'gives correct precedence when encountering multiple encoding flags' do # Any encoding flag overrides all previous encoding flags. If there are # multiple encoding flags in an options string, the last one wins. # E.g. /(?dau)\w/ matches UTF8 chars but /(?dua)\w/ only ASCII chars. regexp1 = /(?dau)\w/ regexp2 = /(?dua)\w/ expect(regexp1).to match 'ü' expect(regexp2).not_to match 'ü' exp1 = RP.parse(regexp1)[1] exp2 = RP.parse(regexp2)[1] expect(exp1.options).to eq(u: true) expect(exp2.options).to eq(a: true) end it 'is accessible via shortcuts' do exp = Root.build expect { exp.options[:i] = true } .to change { exp.i? }.from(false).to(true) .and change { exp.ignore_case? }.from(false).to(true) .and change { exp.case_insensitive? }.from(false).to(true) expect { exp.options[:m] = true } .to change { exp.m? }.from(false).to(true) .and change { exp.multiline? }.from(false).to(true) expect { exp.options[:x] = true } .to change { exp.x? }.from(false).to(true) .and change { exp.extended? }.from(false).to(true) .and change { exp.free_spacing? }.from(false).to(true) expect { exp.options[:a] = true } .to change { exp.a? }.from(false).to(true) .and change { exp.ascii_classes? }.from(false).to(true) expect { exp.options[:d] = true } .to change { exp.d? }.from(false).to(true) .and change { exp.default_classes? }.from(false).to(true) expect { exp.options[:u] = true } .to change { exp.u? }.from(false).to(true) .and change { exp.unicode_classes? }.from(false).to(true) end RSpec.shared_examples '#options' do |regexp, klass, at: []| it "works for expression class #{klass}" do exp = RP.parse(/#{regexp.source}/i).dig(*at) expect(exp).to be_a(klass) expect(exp).to be_i expect(exp).not_to be_x end end include_examples '#options', //, Root include_examples '#options', /a/, Literal, at: [0] include_examples '#options', /\A/, Anchor::Base, at: [0] include_examples '#options', /\d/, CharacterType::Base, at: [0] include_examples '#options', /\n/, EscapeSequence::Base, at: [0] include_examples '#options', /\K/, Keep::Mark, at: [0] include_examples '#options', /./, CharacterType::Any, at: [0] include_examples '#options', /(a)/, Group::Base, at: [0] include_examples '#options', /(a)/, Literal, at: [0, 0] include_examples '#options', /(?=a)/, Assertion::Base, at: [0] include_examples '#options', /(?=a)/, Literal, at: [0, 0] include_examples '#options', /(a|b)/, Group::Base, at: [0] include_examples '#options', /(a|b)/, Alternation, at: [0, 0] include_examples '#options', /(a|b)/, Alternative, at: [0, 0, 0] include_examples '#options', /(a|b)/, Literal, at: [0, 0, 0, 0] include_examples '#options', /(a)\1/, Backreference::Base, at: [1] include_examples '#options', /(a)\k<1>/, Backreference::Number, at: [1] include_examples '#options', /(a)\g<1>/, Backreference::NumberCall, at: [1] include_examples '#options', /[a]/, CharacterSet, at: [0] include_examples '#options', /[a]/, Literal, at: [0, 0] include_examples '#options', /[a-z]/, CharacterSet::Range, at: [0, 0] include_examples '#options', /[a-z]/, Literal, at: [0, 0, 0] include_examples '#options', /[a&&z]/, CharacterSet::Intersection, at: [0, 0] include_examples '#options', /[a&&z]/, CharacterSet::IntersectedSequence, at: [0, 0, 0] include_examples '#options', /[a&&z]/, Literal, at: [0, 0, 0, 0] include_examples '#options', /[[:ascii:]]/, PosixClass, at: [0, 0] include_examples '#options', /\p{word}/, UnicodeProperty::Base, at: [0] include_examples '#options', /(a)(?(1)b|c)/, Conditional::Expression, at: [1] include_examples '#options', /(a)(?(1)b|c)/, Conditional::Condition, at: [1, 0] include_examples '#options', /(a)(?(1)b|c)/, Conditional::Branch, at: [1, 1] include_examples '#options', /(a)(?(1)b|c)/, Literal, at: [1, 1, 0] end regexp_parser-1.6.0/spec/expression/base_spec.rb0000644000004100000410000000505113541126476022030 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Expression::Base) do specify('#to_re') do re_text = '^a*(b([cde]+))+f?$' re = RP.parse(re_text).to_re expect(re).to be_a(::Regexp) expect(re_text).to eq re.source end specify('#level') do regexp = /^a(b(c(d)))e$/ root = RP.parse(regexp) ['^', 'a', '(b(c(d)))', 'e', '$'].each_with_index do |t, i| expect(root[i].to_s).to eq t expect(root[i].level).to eq 0 end expect(root[2][0].to_s).to eq 'b' expect(root[2][0].level).to eq 1 expect(root[2][1][0].to_s).to eq 'c' expect(root[2][1][0].level).to eq 2 expect(root[2][1][1][0].to_s).to eq 'd' expect(root[2][1][1][0].level).to eq 3 end specify('#terminal?') do root = RP.parse('^a([b]+)c$') expect(root).not_to be_terminal expect(root[0]).to be_terminal expect(root[1]).to be_terminal expect(root[2]).not_to be_terminal expect(root[2][0]).not_to be_terminal expect(root[2][0][0]).to be_terminal expect(root[3]).to be_terminal expect(root[4]).to be_terminal end specify('alt #terminal?') do root = RP.parse('^(ab|cd)$') expect(root).not_to be_terminal expect(root[0]).to be_terminal expect(root[1]).not_to be_terminal expect(root[1][0]).not_to be_terminal expect(root[1][0][0]).not_to be_terminal expect(root[1][0][0][0]).to be_terminal expect(root[1][0][1]).not_to be_terminal expect(root[1][0][1][0]).to be_terminal end specify('#coded_offset') do root = RP.parse('^a*(b+(c?))$') expect(root.coded_offset).to eq '@0+12' [ ['@0+1', '^'], ['@1+2', 'a*'], ['@3+8', '(b+(c?))'], ['@11+1', '$'], ].each_with_index do |check, i| against = [root[i].coded_offset, root[i].to_s] expect(against).to eq check end expect([root[2][0].coded_offset, root[2][0].to_s]).to eq ['@4+2', 'b+'] expect([root[2][1].coded_offset, root[2][1].to_s]).to eq ['@6+4', '(c?)'] expect([root[2][1][0].coded_offset, root[2][1][0].to_s]).to eq ['@7+2', 'c?'] end specify('#quantity') do expect(RP.parse(/aa/)[0].quantity).to eq [nil, nil] expect(RP.parse(/a?/)[0].quantity).to eq [0, 1] expect(RP.parse(/a*/)[0].quantity).to eq [0, -1] expect(RP.parse(/a+/)[0].quantity).to eq [1, -1] end specify('#repetitions') do expect(RP.parse(/aa/)[0].repetitions).to eq 1..1 expect(RP.parse(/a?/)[0].repetitions).to eq 0..1 expect(RP.parse(/a*/)[0].repetitions).to eq 0..(Float::INFINITY) expect(RP.parse(/a+/)[0].repetitions).to eq 1..(Float::INFINITY) end end regexp_parser-1.6.0/spec/expression/methods/0000755000004100000410000000000013541126476021221 5ustar www-datawww-dataregexp_parser-1.6.0/spec/expression/methods/strfregexp_spec.rb0000644000004100000410000001457013541126476024760 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#strfregexp') do specify('#strfre alias') do expect(RP.parse(/a/)).to respond_to(:strfre) end specify('#strfregexp level') do root = RP.parse(/a(b(c))/) expect(root.strfregexp('%l')).to eq 'root' a = root.first expect(a.strfregexp('%%l')).to eq '%0' b = root[1].first expect(b.strfregexp('<%l>')).to eq '<1>' c = root[1][1].first expect(c.strfregexp('[at: %l]')).to eq '[at: 2]' end specify('#strfregexp start end') do root = RP.parse(/a(b(c))/) expect(root.strfregexp('%s')).to eq '0' expect(root.strfregexp('%e')).to eq '7' a = root.first expect(a.strfregexp('%%s')).to eq '%0' expect(a.strfregexp('%e')).to eq '1' group_1 = root[1] expect(group_1.strfregexp('GRP:%s')).to eq 'GRP:1' expect(group_1.strfregexp('%e')).to eq '7' b = group_1.first expect(b.strfregexp('<@%s>')).to eq '<@2>' expect(b.strfregexp('%e')).to eq '3' c = group_1.last.first expect(c.strfregexp('[at: %s]')).to eq '[at: 4]' expect(c.strfregexp('%e')).to eq '5' end specify('#strfregexp length') do root = RP.parse(/a[b]c/) expect(root.strfregexp('%S')).to eq '5' a = root.first expect(a.strfregexp('%S')).to eq '1' set = root[1] expect(set.strfregexp('%S')).to eq '3' end specify('#strfregexp coded offset') do root = RP.parse(/a[b]c/) expect(root.strfregexp('%o')).to eq '@0+5' a = root.first expect(a.strfregexp('%o')).to eq '@0+1' set = root[1] expect(set.strfregexp('%o')).to eq '@1+3' end specify('#strfregexp type token') do root = RP.parse(/a[b](c)/) expect(root.strfregexp('%y')).to eq 'expression' expect(root.strfregexp('%k')).to eq 'root' expect(root.strfregexp('%i')).to eq 'expression:root' expect(root.strfregexp('%c')).to eq 'Regexp::Expression::Root' a = root.first expect(a.strfregexp('%y')).to eq 'literal' expect(a.strfregexp('%k')).to eq 'literal' expect(a.strfregexp('%i')).to eq 'literal:literal' expect(a.strfregexp('%c')).to eq 'Regexp::Expression::Literal' set = root[1] expect(set.strfregexp('%y')).to eq 'set' expect(set.strfregexp('%k')).to eq 'character' expect(set.strfregexp('%i')).to eq 'set:character' expect(set.strfregexp('%c')).to eq 'Regexp::Expression::CharacterSet' group = root.last expect(group.strfregexp('%y')).to eq 'group' expect(group.strfregexp('%k')).to eq 'capture' expect(group.strfregexp('%i')).to eq 'group:capture' expect(group.strfregexp('%c')).to eq 'Regexp::Expression::Group::Capture' end specify('#strfregexp quantifier') do root = RP.parse(/a+[b](c)?d{3,4}/) expect(root.strfregexp('%q')).to eq '{1}' expect(root.strfregexp('%Q')).to eq '' expect(root.strfregexp('%z, %Z')).to eq '1, 1' a = root.first expect(a.strfregexp('%q')).to eq '{1, or-more}' expect(a.strfregexp('%Q')).to eq '+' expect(a.strfregexp('%z, %Z')).to eq '1, -1' set = root[1] expect(set.strfregexp('%q')).to eq '{1}' expect(set.strfregexp('%Q')).to eq '' expect(set.strfregexp('%z, %Z')).to eq '1, 1' group = root[2] expect(group.strfregexp('%q')).to eq '{0, 1}' expect(group.strfregexp('%Q')).to eq '?' expect(group.strfregexp('%z, %Z')).to eq '0, 1' d = root.last expect(d.strfregexp('%q')).to eq '{3, 4}' expect(d.strfregexp('%Q')).to eq '{3,4}' expect(d.strfregexp('%z, %Z')).to eq '3, 4' end specify('#strfregexp text') do root = RP.parse(/a(b(c))|[d-gk-p]+/) expect(root.strfregexp('%t')).to eq 'a(b(c))|[d-gk-p]+' expect(root.strfregexp('%~t')).to eq 'expression:root' alt = root.first expect(alt.strfregexp('%t')).to eq 'a(b(c))|[d-gk-p]+' expect(alt.strfregexp('%T')).to eq 'a(b(c))|[d-gk-p]+' expect(alt.strfregexp('%~t')).to eq 'meta:alternation' seq_1 = alt.first expect(seq_1.strfregexp('%t')).to eq 'a(b(c))' expect(seq_1.strfregexp('%T')).to eq 'a(b(c))' expect(seq_1.strfregexp('%~t')).to eq 'expression:sequence' group = seq_1[1] expect(group.strfregexp('%t')).to eq '(b(c))' expect(group.strfregexp('%T')).to eq '(b(c))' expect(group.strfregexp('%~t')).to eq 'group:capture' seq_2 = alt.last expect(seq_2.strfregexp('%t')).to eq '[d-gk-p]+' expect(seq_2.strfregexp('%T')).to eq '[d-gk-p]+' set = seq_2.first expect(set.strfregexp('%t')).to eq '[d-gk-p]' expect(set.strfregexp('%T')).to eq '[d-gk-p]+' expect(set.strfregexp('%~t')).to eq 'set:character' end specify('#strfregexp combined') do root = RP.parse(/a{5}|[b-d]+/) expect(root.strfregexp('%b')).to eq '@0+11 expression:root' expect(root.strfregexp('%b')).to eq root.strfregexp('%o %i') expect(root.strfregexp('%m')).to eq '@0+11 expression:root {1}' expect(root.strfregexp('%m')).to eq root.strfregexp('%b %q') expect(root.strfregexp('%a')).to eq '@0+11 expression:root {1} a{5}|[b-d]+' expect(root.strfregexp('%a')).to eq root.strfregexp('%m %t') end specify('#strfregexp conditional') do root = RP.parse('(?a)(?()b|c)', 'ruby/2.0') expect { root.strfregexp }.not_to(raise_error) end specify('#strfregexp_tree') do root = RP.parse(/a[b-d]*(e(f+))?/) expect(root.strfregexp_tree('%>%o %~t')).to eq( "@0+15 expression:root\n" + " @0+1 a\n" + " @1+6 set:character\n" + " @2+3 set:range\n" + " @2+1 b\n" + " @4+1 d\n" + " @7+8 group:capture\n" + " @8+1 e\n" + " @9+4 group:capture\n" + " @10+2 f+" ) end specify('#strfregexp_tree separator') do root = RP.parse(/a[b-d]*(e(f+))?/) expect(root.strfregexp_tree('%>%o %~t', true, '-SEP-')).to eq( "@0+15 expression:root-SEP-" + " @0+1 a-SEP-" + " @1+6 set:character-SEP-" + " @2+3 set:range-SEP-" + " @2+1 b-SEP-" + " @4+1 d-SEP-" + " @7+8 group:capture-SEP-" + " @8+1 e-SEP-" + " @9+4 group:capture-SEP-" + " @10+2 f+" ) end specify('#strfregexp_tree excluding self') do root = RP.parse(/a[b-d]*(e(f+))?/) expect(root.strfregexp_tree('%>%o %~t', false)).to eq( "@0+1 a\n" + "@1+6 set:character\n" + " @2+3 set:range\n" + " @2+1 b\n" + " @4+1 d\n" + "@7+8 group:capture\n" + " @8+1 e\n" + " @9+4 group:capture\n" + " @10+2 f+" ) end end regexp_parser-1.6.0/spec/expression/methods/match_length_spec.rb0000644000004100000410000001423313541126476025220 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::MatchLength) do ML = described_class specify('literal') { expect(ML.of(/a/).minmax).to eq [1, 1] } specify('literal sequence') { expect(ML.of(/abc/).minmax).to eq [3, 3] } specify('dot') { expect(ML.of(/./).minmax).to eq [1, 1] } specify('set') { expect(ML.of(/[abc]/).minmax).to eq [1, 1] } specify('type') { expect(ML.of(/\d/).minmax).to eq [1, 1] } specify('escape') { expect(ML.of(/\n/).minmax).to eq [1, 1] } specify('property') { expect(ML.of(/\p{ascii}/).minmax).to eq [1, 1] } specify('codepoint list') { expect(ML.of(/\u{61 62 63}/).minmax).to eq [3, 3] } specify('multi-char literal') { expect(ML.of(/abc/).minmax).to eq [3, 3] } specify('fixed quantified') { expect(ML.of(/a{5}/).minmax).to eq [5, 5] } specify('range quantified') { expect(ML.of(/a{5,9}/).minmax).to eq [5, 9] } specify('nested quantified') { expect(ML.of(/(a{2}){3,4}/).minmax).to eq [6, 8] } specify('open-end quantified') { expect(ML.of(/a*/).minmax).to eq [0, Float::INFINITY] } specify('empty subexpression') { expect(ML.of(//).minmax).to eq [0, 0] } specify('anchor') { expect(ML.of(/^$/).minmax).to eq [0, 0] } specify('lookaround') { expect(ML.of(/(?=abc)/).minmax).to eq [0, 0] } specify('free space') { expect(ML.of(/ /x).minmax).to eq [0, 0] } specify('comment') { expect(ML.of(/(?#comment)/x).minmax).to eq [0, 0] } specify('backreference') { expect(ML.of(/(abc){2}\1/).minmax).to eq [9, 9] } specify('subexp call') { expect(ML.of(/(abc){2}\g<-1>/).minmax).to eq [9, 9] } specify('alternation') { expect(ML.of(/a|bcde/).minmax).to eq [1, 4] } specify('nested alternation') { expect(ML.of(/a|bc(d|efg)/).minmax).to eq [1, 5] } specify('quantified alternation') { expect(ML.of(/a|bcde?/).minmax).to eq [1, 4] } if ruby_version_at_least('2.4.1') specify('absence group') { expect(ML.of('(?~abc)').minmax).to eq [0, Float::INFINITY] } end specify('raises for missing references') do exp = RP.parse(/(a)\1/).last exp.referenced_expression = nil expect { exp.match_length }.to raise_error(ArgumentError) end describe('::of') do it('works with Regexps') { expect(ML.of(/foo/).minmax).to eq [3, 3] } it('works with Strings') { expect(ML.of('foo').minmax).to eq [3, 3] } it('works with Expressions') { expect(ML.of(RP.parse(/foo/)).minmax).to eq [3, 3] } end describe('Expression#match_length') do it('returns the MatchLength') { expect(RP.parse(/abc/).match_length.minmax).to eq [3, 3] } end describe('Expression#inner_match_length') do it 'returns the MatchLength of an expression that does not count towards parent match_length' do exp = RP.parse(/(?=ab|cdef)/)[0] expect(exp).to be_a Regexp::Expression::Assertion::Base expect(exp.match_length.minmax).to eq [0, 0] expect(exp.inner_match_length.minmax).to eq [2, 4] end end describe('#include?') do specify('unquantified') do expect(ML.of(/a/)).to include 1 expect(ML.of(/a/)).not_to include 0 expect(ML.of(/a/)).not_to include 2 end specify('fixed quantified') do expect(ML.of(/a{5}/)).to include 5 expect(ML.of(/a{5}/)).not_to include 0 expect(ML.of(/a{5}/)).not_to include 4 expect(ML.of(/a{5}/)).not_to include 6 end specify('variably quantified') do expect(ML.of(/a?/)).to include 0 expect(ML.of(/a?/)).to include 1 expect(ML.of(/a?/)).not_to include 2 end specify('nested quantified') do expect(ML.of(/(a{2}){3,4}/)).to include 6 expect(ML.of(/(a{2}){3,4}/)).to include 8 expect(ML.of(/(a{2}){3,4}/)).not_to include 0 expect(ML.of(/(a{2}){3,4}/)).not_to include 5 expect(ML.of(/(a{2}){3,4}/)).not_to include 7 expect(ML.of(/(a{2}){3,4}/)).not_to include 9 end specify('branches') do expect(ML.of(/ab|cdef/)).to include 2 expect(ML.of(/ab|cdef/)).to include 4 expect(ML.of(/ab|cdef/)).not_to include 0 expect(ML.of(/ab|cdef/)).not_to include 3 expect(ML.of(/ab|cdef/)).not_to include 5 end specify('called on leaf node') do expect(ML.of(RP.parse(/a{2}/)[0])).to include 2 expect(ML.of(RP.parse(/a{2}/)[0])).not_to include 0 expect(ML.of(RP.parse(/a{2}/)[0])).not_to include 1 expect(ML.of(RP.parse(/a{2}/)[0])).not_to include 3 end end describe('#fixed?') do specify('unquantified') { expect(ML.of(/a/)).to be_fixed } specify('fixed quantified') { expect(ML.of(/a{5}/)).to be_fixed } specify('variably quantified') { expect(ML.of(/a?/)).not_to be_fixed } specify('equal branches') { expect(ML.of(/ab|cd/)).to be_fixed } specify('unequal branches') { expect(ML.of(/ab|cdef/)).not_to be_fixed } specify('equal quantified branches') { expect(ML.of(/a{2}|cd/)).to be_fixed } specify('unequal quantified branches') { expect(ML.of(/a{3}|cd/)).not_to be_fixed } specify('empty') { expect(ML.of(//)).to be_fixed } end describe('#each') do it 'returns an Enumerator if called without a block' do result = ML.of(/a?/).each expect(result).to be_a(Enumerator) expect(result.next).to eq 0 expect(result.next).to eq 1 expect { result.next }.to raise_error(StopIteration) end it 'is limited to 1000 iterations in case there are infinite match lengths' do expect(ML.of(/a*/).first(3000).size).to eq 1000 end it 'scaffolds the Enumerable interface' do expect(ML.of(/abc|defg/).count).to eq 2 expect(ML.of(/(ab)*/).first(5)).to eq [0, 2, 4, 6, 8] expect(ML.of(/a{,10}/).any? { |len| len > 20 }).to be false end end describe('#endless_each') do it 'returns an Enumerator if called without a block' do result = ML.of(/a?/).endless_each expect(result).to be_a(Enumerator) expect(result.next).to eq 0 expect(result.next).to eq 1 expect { result.next }.to raise_error(StopIteration) end it 'never stops iterating for infinite match lengths' do expect(ML.of(/a*/).endless_each.first(3000).size).to eq 3000 end end describe('#inspect') do it 'is nice' do result = RP.parse(/a{2,4}/)[0].match_length expect(result.inspect).to eq '# min=2 max=4>' end end end regexp_parser-1.6.0/spec/expression/methods/traverse_spec.rb0000644000004100000410000000665213541126476024424 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Subexpression#traverse') do specify('Subexpression#traverse') do root = RP.parse(/a(b(c(d)))|g[h-i]j|klmn/) enters = 0 visits = 0 exits = 0 root.traverse do |event, _exp, _index| enters = (enters + 1) if event == :enter visits = (visits + 1) if event == :visit exits = (exits + 1) if event == :exit end expect(enters).to eq 9 expect(enters).to eq exits expect(visits).to eq 9 end specify('Subexpression#traverse including self') do root = RP.parse(/a(b(c(d)))|g[h-i]j|klmn/) enters = 0 visits = 0 exits = 0 root.traverse(true) do |event, _exp, _index| enters = (enters + 1) if event == :enter visits = (visits + 1) if event == :visit exits = (exits + 1) if event == :exit end expect(enters).to eq 10 expect(enters).to eq exits expect(visits).to eq 9 end specify('Subexpression#walk alias') do root = RP.parse(/abc/) expect(root).to respond_to(:walk) end specify('Subexpression#each_expression') do root = RP.parse(/a(?x:b(c))|g[h-k]/) count = 0 root.each_expression { count += 1 } expect(count).to eq 13 end specify('Subexpression#each_expression including self') do root = RP.parse(/a(?x:b(c))|g[h-k]/) count = 0 root.each_expression(true) { count += 1 } expect(count).to eq 14 end specify('Subexpression#each_expression indices') do root = RP.parse(/a(b)c/) indices = [] root.each_expression { |_exp, index| (indices << index) } expect(indices).to eq [0, 1, 0, 2] end specify('Subexpression#each_expression indices including self') do root = RP.parse(/a(b)c/) indices = [] root.each_expression(true) { |_exp, index| (indices << index) } expect(indices).to eq [0, 0, 1, 0, 2] end specify('Subexpression#flat_map without block') do root = RP.parse(/a(b([c-e]+))?/) array = root.flat_map expect(array).to be_instance_of(Array) expect(array.length).to eq 8 array.each do |item| expect(item).to be_instance_of(Array) expect(item.length).to eq 2 expect(item.first).to be_a(Regexp::Expression::Base) expect(item.last).to be_a(Integer) end end specify('Subexpression#flat_map without block including self') do root = RP.parse(/a(b([c-e]+))?/) array = root.flat_map(true) expect(array).to be_instance_of(Array) expect(array.length).to eq 9 end specify('Subexpression#flat_map indices') do root = RP.parse(/a(b([c-e]+))?f*g/) indices = root.flat_map { |_exp, index| index } expect(indices).to eq [0, 1, 0, 1, 0, 0, 0, 1, 2, 3] end specify('Subexpression#flat_map indices including self') do root = RP.parse(/a(b([c-e]+))?f*g/) indices = root.flat_map(true) { |_exp, index| index } expect(indices).to eq [0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 3] end specify('Subexpression#flat_map expressions') do root = RP.parse(/a(b(c(d)))/) levels = root.flat_map { |exp, _index| [exp.level, exp.text] if exp.terminal? }.compact expect(levels).to eq [[0, 'a'], [1, 'b'], [2, 'c'], [3, 'd']] end specify('Subexpression#flat_map expressions including self') do root = RP.parse(/a(b(c(d)))/) levels = root.flat_map(true) { |exp, _index| [exp.level, exp.to_s] }.compact expect(levels).to eq [[nil, 'a(b(c(d)))'], [0, 'a'], [0, '(b(c(d)))'], [1, 'b'], [1, '(c(d))'], [2, 'c'], [2, '(d)'], [3, 'd']] end end regexp_parser-1.6.0/spec/expression/methods/tests_spec.rb0000644000004100000410000000650713541126476023732 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('ExpressionTests') do specify('#type?') do root = RP.parse(/abcd|(ghij)|[klmn]/) alt = root.first expect(alt.type?(:meta)).to be true expect(alt.type?(:escape)).to be false expect(alt.type?(%i[meta escape])).to be true expect(alt.type?(%i[literal escape])).to be false expect(alt.type?(:*)).to be true expect(alt.type?([:*])).to be true expect(alt.type?(%i[literal escape *])).to be true seq_1 = alt[0] expect(seq_1.type?(:expression)).to be true expect(seq_1.first.type?(:literal)).to be true seq_2 = alt[1] expect(seq_2.type?(:*)).to be true expect(seq_2.first.type?(:group)).to be true seq_3 = alt[2] expect(seq_3.first.type?(:set)).to be true end specify('#is?') do root = RP.parse(/.+|\.?/) expect(root.is?(:*)).to be true alt = root.first expect(alt.is?(:*)).to be true expect(alt.is?(:alternation)).to be true expect(alt.is?(:alternation, :meta)).to be true seq_1 = alt[0] expect(seq_1.is?(:sequence)).to be true expect(seq_1.is?(:sequence, :expression)).to be true expect(seq_1.first.is?(:dot)).to be true expect(seq_1.first.is?(:dot, :escape)).to be false expect(seq_1.first.is?(:dot, :meta)).to be true expect(seq_1.first.is?(:dot, %i[escape meta])).to be true seq_2 = alt[1] expect(seq_2.first.is?(:dot)).to be true expect(seq_2.first.is?(:dot, :escape)).to be true expect(seq_2.first.is?(:dot, :meta)).to be false expect(seq_2.first.is?(:dot, %i[meta escape])).to be true end specify('#one_of?') do root = RP.parse(/\Aab(c[\w])d|e.\z/) expect(root.one_of?(:*)).to be true expect(root.one_of?(:* => :*)).to be true expect(root.one_of?(:* => [:*])).to be true alt = root.first expect(alt.one_of?(:*)).to be true expect(alt.one_of?(:meta)).to be true expect(alt.one_of?(:meta, :alternation)).to be true expect(alt.one_of?(meta: %i[dot bogus])).to be false expect(alt.one_of?(meta: %i[dot alternation])).to be true seq_1 = alt[0] expect(seq_1.one_of?(:expression)).to be true expect(seq_1.one_of?(expression: :sequence)).to be true expect(seq_1.first.one_of?(:anchor)).to be true expect(seq_1.first.one_of?(anchor: :bos)).to be true expect(seq_1.first.one_of?(anchor: :eos)).to be false expect(seq_1.first.one_of?(anchor: %i[escape meta bos])).to be true expect(seq_1.first.one_of?(anchor: %i[escape meta eos])).to be false seq_2 = alt[1] expect(seq_2.first.one_of?(:literal)).to be true expect(seq_2[1].one_of?(:meta)).to be true expect(seq_2[1].one_of?(meta: :dot)).to be true expect(seq_2[1].one_of?(meta: :alternation)).to be false expect(seq_2[1].one_of?(meta: [:dot])).to be true expect(seq_2.last.one_of?(:group)).to be false expect(seq_2.last.one_of?(group: [:*])).to be false expect(seq_2.last.one_of?(group: [:*], meta: :*)).to be false expect(seq_2.last.one_of?(:meta => [:*], :* => :*)).to be true expect(seq_2.last.one_of?(meta: [:*], anchor: :*)).to be true expect(seq_2.last.one_of?(meta: [:*], anchor: :eos)).to be true expect(seq_2.last.one_of?(meta: [:*], anchor: [:bos])).to be false expect(seq_2.last.one_of?(meta: [:*], anchor: %i[bos eos])).to be true expect { root.one_of?(Object.new) }.to raise_error(ArgumentError) end end regexp_parser-1.6.0/spec/expression/methods/match_spec.rb0000644000004100000410000000123513541126476023655 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#match') do it 'returns the #match result of the respective Regexp' do expect(RP.parse(/a/).match('a')[0]).to eq 'a' end it 'can be given an offset, just like Regexp#match' do expect(RP.parse(/./).match('ab', 1)[0]).to eq 'b' end it 'works with the #=~ alias' do expect(RP.parse(/a/) =~ 'a').to be_a MatchData end end RSpec.describe('Expression#match?') do it 'returns true if the Respective Regexp matches' do expect(RP.parse(/a/).match?('a')).to be true end it 'returns false if the Respective Regexp does not match' do expect(RP.parse(/a/).match?('b')).to be false end end regexp_parser-1.6.0/spec/expression/to_h_spec.rb0000644000004100000410000000152013541126476022044 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#to_h') do specify('Root#to_h') do root = RP.parse('abc') hash = root.to_h expect(token: :root, type: :expression, text: 'abc', starts_at: 0, length: 3, quantifier: nil, options: {}, level: nil, set_level: nil, conditional_level: nil, expressions: [{ token: :literal, type: :literal, text: 'abc', starts_at: 0, length: 3, quantifier: nil, options: {}, level: 0, set_level: 0, conditional_level: 0 }]).to eq hash end specify('Quantifier#to_h') do root = RP.parse('a{2,4}') exp = root.expressions.at(0) hash = exp.quantifier.to_h expect(max: 4, min: 2, mode: :greedy, text: '{2,4}', token: :interval).to eq hash end specify('Conditional#to_h') do root = RP.parse('(?a)(?()b|c)', 'ruby/2.0') expect { root.to_h }.not_to(raise_error) end end regexp_parser-1.6.0/spec/expression/sequence_spec.rb0000644000004100000410000000037513541126476022732 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Expression::Sequence) do describe('#initialize') do it 'supports the old, nonstandard arity for backwards compatibility' do expect { Sequence.new(0, 0, 0) }.to output.to_stderr end end end regexp_parser-1.6.0/spec/expression/free_space_spec.rb0000644000004100000410000000116613541126476023215 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Expression::FreeSpace) do specify('white space quantify raises error') do regexp = / a # Comment /x root = RP.parse(regexp) space = root[0] expect(space).to be_instance_of(FreeSpace::WhiteSpace) expect { space.quantify(:dummy, '#') }.to raise_error(RuntimeError) end specify('comment quantify raises error') do regexp = / a # Comment /x root = RP.parse(regexp) comment = root[3] expect(comment).to be_instance_of(FreeSpace::Comment) expect { comment.quantify(:dummy, '#') }.to raise_error(RuntimeError) end end regexp_parser-1.6.0/spec/expression/subexpression_spec.rb0000644000004100000410000000235213541126476024030 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Expression::Subexpression) do specify('#ts, #te') do regx = /abcd|ghij|klmn|pqur/ root = RP.parse(regx) alt = root.first { 0 => [0, 4], 1 => [5, 9], 2 => [10, 14], 3 => [15, 19] }.each do |index, span| sequence = alt[index] expect(sequence.ts).to eq span[0] expect(sequence.te).to eq span[1] end end specify('#nesting_level') do root = RP.parse(/a(b(\d|[ef-g[h]]))/) tests = { 'a' => 1, 'b' => 2, '\d|[ef-g[h]]' => 3, # alternation '\d' => 4, # first alternative '[ef-g[h]]' => 4, # second alternative 'e' => 5, 'f-g' => 5, 'f' => 6, 'g' => 6, 'h' => 6, } root.each_expression do |exp| next unless expected_nesting_level = tests.delete(exp.to_s) expect(expected_nesting_level).to eq exp.nesting_level end expect(tests).to be_empty end specify('#dig') do root = RP.parse(/(((a)))/) expect(root.dig(0).to_s).to eq '(((a)))' expect(root.dig(0, 0, 0, 0).to_s).to eq 'a' expect(root.dig(0, 0, 0, 0, 0)).to be_nil expect(root.dig(3, 7)).to be_nil end end regexp_parser-1.6.0/spec/expression/conditional_spec.rb0000644000004100000410000000503213541126476023420 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Expression::Conditional) do let(:root) { RP.parse('^(a(b))(b(?(1)c|(?(2)d|(?(3)e|f)))g)$') } let(:cond_1) { root[2][1] } let(:cond_2) { root[2][1][2][0] } let(:cond_3) { root[2][1][2][0][2][0] } specify('root level') do [ '^', '(a(b))', '(b(?(1)c|(?(2)d|(?(3)e|f)))g)', '$' ].each_with_index do |t, i| expect(root[i].conditional_level).to eq 0 expect(root[i].to_s).to eq t end expect(root[2][0].to_s).to eq 'b' expect(root[2][0].conditional_level).to eq 0 end specify('level one') do condition = cond_1.condition branch_1 = cond_1.branches.first expect(condition).to be_a Conditional::Condition expect(condition.to_s).to eq '(1)' expect(condition.conditional_level).to eq 1 expect(branch_1).to be_a Conditional::Branch expect(branch_1.to_s).to eq 'c' expect(branch_1.conditional_level).to eq 1 expect(branch_1.first.to_s).to eq 'c' expect(branch_1.first.conditional_level).to eq 1 end specify('level two') do condition = cond_2.condition branch_1 = cond_2.branches.first branch_2 = cond_2.branches.last expect(cond_2.to_s).to start_with '(?' expect(cond_2.conditional_level).to eq 1 expect(condition).to be_a Conditional::Condition expect(condition.to_s).to eq '(2)' expect(condition.conditional_level).to eq 2 expect(branch_1).to be_a Conditional::Branch expect(branch_1.to_s).to eq 'd' expect(branch_1.conditional_level).to eq 2 expect(branch_1.first.to_s).to eq 'd' expect(branch_1.first.conditional_level).to eq 2 expect(branch_2.first.to_s).to start_with '(?' expect(branch_2.first.conditional_level).to eq 2 end specify('level three') do condition = cond_3.condition branch_1 = cond_3.branches.first branch_2 = cond_3.branches.last expect(condition).to be_a Conditional::Condition expect(condition.to_s).to eq '(3)' expect(condition.conditional_level).to eq 3 expect(cond_3.to_s).to eq '(?(3)e|f)' expect(cond_3.conditional_level).to eq 2 expect(branch_1).to be_a Conditional::Branch expect(branch_1.to_s).to eq 'e' expect(branch_1.conditional_level).to eq 3 expect(branch_1.first.to_s).to eq 'e' expect(branch_1.first.conditional_level).to eq 3 expect(branch_2).to be_a Conditional::Branch expect(branch_2.to_s).to eq 'f' expect(branch_2.conditional_level).to eq 3 expect(branch_2.first.to_s).to eq 'f' expect(branch_2.first.conditional_level).to eq 3 end end regexp_parser-1.6.0/spec/expression/clone_spec.rb0000644000004100000410000000724613541126476022226 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#clone') do specify('Base#clone') do root = RP.parse(/^(?i:a)b+$/i) copy = root.clone expect(copy.to_s).to eq root.to_s expect(root.object_id).not_to eq copy.object_id expect(root.text).to eq copy.text expect(root.text.object_id).not_to eq copy.text.object_id root_1 = root[1] copy_1 = copy[1] expect(root_1.options).to eq copy_1.options expect(root_1.options.object_id).not_to eq copy_1.options.object_id root_2 = root[2] copy_2 = copy[2] expect(root_2).to be_quantified expect(copy_2).to be_quantified expect(root_2.quantifier.text).to eq copy_2.quantifier.text expect(root_2.quantifier.text.object_id).not_to eq copy_2.quantifier.text.object_id expect(root_2.quantifier.object_id).not_to eq copy_2.quantifier.object_id # regression test expect { root_2.clone }.not_to change { root_2.quantifier.object_id } expect { root_2.clone }.not_to change { root_2.quantifier.text.object_id } end specify('Subexpression#clone') do root = RP.parse(/^a(b([cde])f)g$/) copy = root.clone expect(copy.to_s).to eq root.to_s expect(root).to respond_to(:expressions) expect(copy).to respond_to(:expressions) expect(root.expressions.object_id).not_to eq copy.expressions.object_id copy.expressions.each_with_index do |exp, index| expect(root[index].object_id).not_to eq exp.object_id end copy[2].each_with_index do |exp, index| expect(root[2][index].object_id).not_to eq exp.object_id end # regression test expect { root.clone }.not_to change { root.expressions.object_id } end specify('Group::Named#clone') do root = RP.parse('^(?a)+bc$') copy = root.clone expect(copy.to_s).to eq root.to_s root_1 = root[1] copy_1 = copy[1] expect(root_1.name).to eq copy_1.name expect(root_1.name.object_id).not_to eq copy_1.name.object_id expect(root_1.text).to eq copy_1.text expect(root_1.expressions.object_id).not_to eq copy_1.expressions.object_id copy_1.expressions.each_with_index do |exp, index| expect(root_1[index].object_id).not_to eq exp.object_id end # regression test expect { root_1.clone }.not_to change { root_1.name.object_id } end specify('Sequence#clone') do root = RP.parse(/(a|b)/) copy = root.clone # regression test expect(copy.to_s).to eq root.to_s root_seq_op = root[0][0] copy_seq_op = copy[0][0] root_seq_1 = root[0][0][0] copy_seq_1 = copy[0][0][0] expect(root_seq_op.object_id).not_to eq copy_seq_op.object_id expect(root_seq_1.object_id).not_to eq copy_seq_1.object_id copy_seq_1.expressions.each_with_index do |exp, index| expect(root_seq_1[index].object_id).not_to eq exp.object_id end end describe('Base#unquantified_clone') do it 'produces a clone' do root = RP.parse(/^a(b([cde])f)g$/) copy = root.unquantified_clone expect(copy.to_s).to eq root.to_s expect(copy.object_id).not_to eq root.object_id end it 'does not carry over the callee quantifier' do expect(RP.parse(/a{3}/)[0]).to be_quantified expect(RP.parse(/a{3}/)[0].unquantified_clone).not_to be_quantified expect(RP.parse(/[a]{3}/)[0]).to be_quantified expect(RP.parse(/[a]{3}/)[0].unquantified_clone).not_to be_quantified expect(RP.parse(/(a|b){3}/)[0]).to be_quantified expect(RP.parse(/(a|b){3}/)[0].unquantified_clone).not_to be_quantified end it 'keeps quantifiers of callee children' do expect(RP.parse(/(a{3}){3}/)[0][0]).to be_quantified expect(RP.parse(/(a{3}){3}/)[0].unquantified_clone[0]).to be_quantified end end end regexp_parser-1.6.0/spec/expression/to_s_spec.rb0000644000004100000410000000477213541126476022073 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Expression#to_s') do specify('literal alternation') do pattern = 'abcd|ghij|klmn|pqur' expect(RP.parse(pattern).to_s).to eq pattern end specify('quantified alternations') do pattern = '(?:a?[b]+(c){2}|d+[e]*(f)?)|(?:g+[h]?(i){2,3}|j*[k]{3,5}(l)?)' expect(RP.parse(pattern).to_s).to eq pattern end specify('quantified sets') do pattern = '[abc]+|[^def]{3,6}' expect(RP.parse(pattern).to_s).to eq pattern end specify('property sets') do pattern = '[\\a\\b\\p{Lu}\\P{Z}\\c\\d]+' expect(RP.parse(pattern, 'ruby/1.9').to_s).to eq pattern end specify('groups') do pattern = "(a(?>b(?:c(?d(?'N'e)??f)+g)*+h)*i)++" expect(RP.parse(pattern, 'ruby/1.9').to_s).to eq pattern end specify('assertions') do pattern = '(a+(?=b+(?!c+(?<=d+(? [:escape, :bell, '\a', 1, 3] # not an escape outside a character set include_examples 'scan', /c\bt/, 1 => [:anchor, :word_boundary, '\b', 1, 3] include_examples 'scan', /c\ft/, 1 => [:escape, :form_feed, '\f', 1, 3] include_examples 'scan', /c\nt/, 1 => [:escape, :newline, '\n', 1, 3] include_examples 'scan', /c\tt/, 1 => [:escape, :tab, '\t', 1, 3] include_examples 'scan', /c\vt/, 1 => [:escape, :vertical_tab, '\v', 1, 3] include_examples 'scan', 'c\qt', 1 => [:escape, :literal, '\q', 1, 3] include_examples 'scan', 'a\012c', 1 => [:escape, :octal, '\012', 1, 5] include_examples 'scan', 'a\0124', 1 => [:escape, :octal, '\012', 1, 5] include_examples 'scan', '\712+7', 0 => [:escape, :octal, '\712', 0, 4] include_examples 'scan', 'a\x24c', 1 => [:escape, :hex, '\x24', 1, 5] include_examples 'scan', 'a\x0640c', 1 => [:escape, :hex, '\x06', 1, 5] include_examples 'scan', 'a\u0640c', 1 => [:escape, :codepoint, '\u0640', 1, 7] include_examples 'scan', 'a\u{640 0641}c', 1 => [:escape, :codepoint_list, '\u{640 0641}', 1, 13] include_examples 'scan', 'a\u{10FFFF}c', 1 => [:escape, :codepoint_list, '\u{10FFFF}', 1, 11] include_examples 'scan', /a\cBc/, 1 => [:escape, :control, '\cB', 1, 4] include_examples 'scan', /a\c^c/, 1 => [:escape, :control, '\c^', 1, 4] include_examples 'scan', /a\c\n/, 1 => [:escape, :control, '\c\n', 1, 5] include_examples 'scan', /a\c\\b/, 1 => [:escape, :control, '\c\\\\', 1, 5] include_examples 'scan', /a\C-bc/, 1 => [:escape, :control, '\C-b', 1, 5] include_examples 'scan', /a\C-^b/, 1 => [:escape, :control, '\C-^', 1, 5] include_examples 'scan', /a\C-\nb/, 1 => [:escape, :control, '\C-\n', 1, 6] include_examples 'scan', /a\C-\\b/, 1 => [:escape, :control, '\C-\\\\', 1, 6] include_examples 'scan', /a\c\M-Bc/n, 1 => [:escape, :control, '\c\M-B', 1, 7] include_examples 'scan', /a\C-\M-Bc/n, 1 => [:escape, :control, '\C-\M-B', 1, 8] include_examples 'scan', /a\M-Bc/n, 1 => [:escape, :meta_sequence, '\M-B', 1, 5] include_examples 'scan', /a\M-\cBc/n, 1 => [:escape, :meta_sequence, '\M-\cB', 1, 7] include_examples 'scan', /a\M-\c^/n, 1 => [:escape, :meta_sequence, '\M-\c^', 1, 7] include_examples 'scan', /a\M-\c\n/n, 1 => [:escape, :meta_sequence, '\M-\c\n', 1, 8] include_examples 'scan', /a\M-\c\\/n, 1 => [:escape, :meta_sequence, '\M-\c\\\\', 1, 8] include_examples 'scan', /a\M-\C-Bc/n, 1 => [:escape, :meta_sequence, '\M-\C-B', 1, 8] include_examples 'scan', /a\M-\C-\\/n, 1 => [:escape, :meta_sequence, '\M-\C-\\\\', 1, 9] include_examples 'scan', 'ab\\\xcd', 1 => [:escape, :backslash, '\\\\', 2, 4] include_examples 'scan', 'ab\\\0cd', 1 => [:escape, :backslash, '\\\\', 2, 4] include_examples 'scan', 'ab\\\Kcd', 1 => [:escape, :backslash, '\\\\', 2, 4] include_examples 'scan', 'ab\^cd', 1 => [:escape, :bol, '\^', 2, 4] include_examples 'scan', 'ab\$cd', 1 => [:escape, :eol, '\$', 2, 4] include_examples 'scan', 'ab\[cd', 1 => [:escape, :set_open, '\[', 2, 4] end regexp_parser-1.6.0/spec/scanner/conditionals_spec.rb0000644000004100000410000001716613541126476023050 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Conditional scanning') do include_examples 'scan', /(a)(?(1)T|F)1/, 3 => [:conditional, :open, '(?', 3, 5] include_examples 'scan', /(a)(?(1)T|F)2/, 4 => [:conditional, :condition_open, '(', 5, 6] include_examples 'scan', /(a)(?(1)T|F)3/, 5 => [:conditional, :condition, '1', 6, 7] include_examples 'scan', /(a)(?(1)T|F)4/, 6 => [:conditional, :condition_close, ')', 7, 8] include_examples 'scan', /(a)(?(1)T|F)5/, 7 => [:literal, :literal, 'T', 8, 9] include_examples 'scan', /(a)(?(1)T|F)6/, 8 => [:conditional, :separator, '|', 9, 10] include_examples 'scan', /(a)(?(1)T|F)7/, 9 => [:literal, :literal, 'F', 10, 11] include_examples 'scan', /(a)(?(1)T|F)8/, 10 => [:conditional, :close, ')', 11, 12] include_examples 'scan', /(a)(?(1)TRUE)9/, 8 => [:conditional, :close, ')', 12, 13] include_examples 'scan', /(a)(?(1)TRUE|)10/, 8 => [:conditional, :separator, '|', 12, 13] include_examples 'scan', /(a)(?(1)TRUE|)11/, 9 => [:conditional, :close, ')', 13, 14] include_examples 'scan', /(?A)(?()T|F)1/, 5 => [:conditional, :condition, '', 10, 13] include_examples 'scan', /(?'N'A)(?('N')T|F)2/, 5 => [:conditional, :condition, "'N'", 10, 13] include_examples 'scan', /(a(b(c)))(?(1)(?(2)d|(?(3)e|f))|(?(2)(?(1)g|h)))/, 0 => [:group, :capture, '(', 0, 1], 1 => [:literal, :literal, 'a', 1, 2], 2 => [:group, :capture, '(', 2, 3], 3 => [:literal, :literal, 'b', 3, 4], 4 => [:group, :capture, '(', 4, 5], 5 => [:literal, :literal, 'c', 5, 6], 6 => [:group, :close, ')', 6, 7], 7 => [:group, :close, ')', 7, 8], 8 => [:group, :close, ')', 8, 9], 9 => [:conditional, :open, '(?', 9, 11], 10 => [:conditional, :condition_open, '(', 11, 12], 11 => [:conditional, :condition, '1', 12, 13], 12 => [:conditional, :condition_close, ')', 13, 14], 13 => [:conditional, :open, '(?', 14, 16], 14 => [:conditional, :condition_open, '(', 16, 17], 15 => [:conditional, :condition, '2', 17, 18], 16 => [:conditional, :condition_close, ')', 18, 19], 17 => [:literal, :literal, 'd', 19, 20], 18 => [:conditional, :separator, '|', 20, 21], 19 => [:conditional, :open, '(?', 21, 23], 20 => [:conditional, :condition_open, '(', 23, 24], 21 => [:conditional, :condition, '3', 24, 25], 22 => [:conditional, :condition_close, ')', 25, 26], 23 => [:literal, :literal, 'e', 26, 27], 24 => [:conditional, :separator, '|', 27, 28], 25 => [:literal, :literal, 'f', 28, 29], 26 => [:conditional, :close, ')', 29, 30], 27 => [:conditional, :close, ')', 30, 31], 28 => [:conditional, :separator, '|', 31, 32], 29 => [:conditional, :open, '(?', 32, 34], 30 => [:conditional, :condition_open, '(', 34, 35], 31 => [:conditional, :condition, '2', 35, 36], 32 => [:conditional, :condition_close, ')', 36, 37], 33 => [:conditional, :open, '(?', 37, 39], 34 => [:conditional, :condition_open, '(', 39, 40], 35 => [:conditional, :condition, '1', 40, 41], 36 => [:conditional, :condition_close, ')', 41, 42], 37 => [:literal, :literal, 'g', 42, 43], 38 => [:conditional, :separator, '|', 43, 44], 39 => [:literal, :literal, 'h', 44, 45], 40 => [:conditional, :close, ')', 45, 46], 41 => [:conditional, :close, ')', 46, 47], 42 => [:conditional, :close, ')', 47, 48] include_examples 'scan', /((a)|(b)|((?(2)(c(d|e)+)?|(?(3)f|(?(4)(g|(h)(i)))))))/, 0 => [:group, :capture, '(', 0, 1], 1 => [:group, :capture, '(', 1, 2], 2 => [:literal, :literal, 'a', 2, 3], 3 => [:group, :close, ')', 3, 4], 4 => [:meta, :alternation, '|', 4, 5], 5 => [:group, :capture, '(', 5, 6], 6 => [:literal, :literal, 'b', 6, 7], 7 => [:group, :close, ')', 7, 8], 8 => [:meta, :alternation, '|', 8, 9], 9 => [:group, :capture, '(', 9, 10], 10 => [:conditional, :open, '(?', 10, 12], 11 => [:conditional, :condition_open, '(', 12, 13], 12 => [:conditional, :condition, '2', 13, 14], 13 => [:conditional, :condition_close, ')', 14, 15], 14 => [:group, :capture, '(', 15, 16], 15 => [:literal, :literal, 'c', 16, 17], 16 => [:group, :capture, '(', 17, 18], 17 => [:literal, :literal, 'd', 18, 19], 18 => [:meta, :alternation, '|', 19, 20], 19 => [:literal, :literal, 'e', 20, 21], 20 => [:group, :close, ')', 21, 22], 21 => [:quantifier, :one_or_more, '+', 22, 23], 22 => [:group, :close, ')', 23, 24], 23 => [:quantifier, :zero_or_one, '?', 24, 25], 24 => [:conditional, :separator, '|', 25, 26], 25 => [:conditional, :open, '(?', 26, 28], 26 => [:conditional, :condition_open, '(', 28, 29], 27 => [:conditional, :condition, '3', 29, 30], 28 => [:conditional, :condition_close, ')', 30, 31], 29 => [:literal, :literal, 'f', 31, 32], 30 => [:conditional, :separator, '|', 32, 33], 31 => [:conditional, :open, '(?', 33, 35], 32 => [:conditional, :condition_open, '(', 35, 36], 33 => [:conditional, :condition, '4', 36, 37], 34 => [:conditional, :condition_close, ')', 37, 38], 35 => [:group, :capture, '(', 38, 39], 36 => [:literal, :literal, 'g', 39, 40], 37 => [:meta, :alternation, '|', 40, 41], 38 => [:group, :capture, '(', 41, 42], 39 => [:literal, :literal, 'h', 42, 43], 40 => [:group, :close, ')', 43, 44], 41 => [:group, :capture, '(', 44, 45], 42 => [:literal, :literal, 'i', 45, 46], 43 => [:group, :close, ')', 46, 47], 44 => [:group, :close, ')', 47, 48], 45 => [:conditional, :close, ')', 48, 49], 46 => [:conditional, :close, ')', 49, 50], 47 => [:conditional, :close, ')', 50, 51], 48 => [:group, :close, ')', 51, 52], 49 => [:group, :close, ')', 52, 53] include_examples 'scan', /(a)(?(1)(b|c|d)|(e|f|g))(h)(?(2)(i|j|k)|(l|m|n))|o|p/, 9 => [:meta, :alternation, '|', 10, 11], 11 => [:meta, :alternation, '|', 12, 13], 14 => [:conditional, :separator, '|', 15, 16], 17 => [:meta, :alternation, '|', 18, 19], 19 => [:meta, :alternation, '|', 20, 21], 32 => [:meta, :alternation, '|', 34, 35], 34 => [:meta, :alternation, '|', 36, 37], 37 => [:conditional, :separator, '|', 39, 40], 40 => [:meta, :alternation, '|', 42, 43], 42 => [:meta, :alternation, '|', 44, 45], 46 => [:meta, :alternation, '|', 48, 49], 48 => [:meta, :alternation, '|', 50, 51] end regexp_parser-1.6.0/spec/scanner/free_space_spec.rb0000644000004100000410000001401013541126476022437 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('FreeSpace scanning') do describe('scan free space tokens') do let(:tokens) { RS.scan(/ a b ? c * d {2,3} e + | f + /x) } 0.upto(24).select(&:even?).each do |i| it "scans #{i} as free space" do expect(tokens[i][0]).to eq :free_space expect(tokens[i][1]).to eq :whitespace end end 0.upto(24).reject(&:even?).each do |i| it "does not scan #{i} as free space" do expect(tokens[i][0]).not_to eq :free_space expect(tokens[i][1]).not_to eq :whitespace end end it 'sets the correct text' do [0, 2, 10, 14].each { |i| expect(tokens[i][2]).to eq "\n " } [4, 6, 8, 12].each { |i| expect(tokens[i][2]).to eq ' ' } end end describe('scan free space comments') do include_examples 'scan', / a + # A + comment b ? # B ? comment c {2,3} # C {2,3} comment d + | e + # D|E comment /x, 5 => [:free_space, :comment, "# A + comment\n", 11, 25], 11 => [:free_space, :comment, "# B ? comment\n", 37, 51], 17 => [:free_space, :comment, "# C {2,3} comment\n", 66, 84], 29 => [:free_space, :comment, "# D|E comment\n", 100, 114] end describe('scan free space inlined') do include_examples 'scan', /a b(?x:c d e)f g/, 0 => [:literal, :literal, 'a b', 0, 3], 1 => [:group, :options, '(?x:', 3, 7], 2 => [:literal, :literal, 'c', 7, 8], 3 => [:free_space, :whitespace, ' ', 8, 9], 4 => [:literal, :literal, 'd', 9, 10], 5 => [:free_space, :whitespace, ' ', 10, 11], 6 => [:literal, :literal, 'e', 11, 12], 7 => [:group, :close, ')', 12, 13], 8 => [:literal, :literal, 'f g', 13, 16] end describe('scan free space nested') do include_examples 'scan', /a b(?x:c d(?-x:e f)g h)i j/, 0 => [:literal, :literal, 'a b', 0, 3], 1 => [:group, :options, '(?x:', 3, 7], 2 => [:literal, :literal, 'c', 7, 8], 3 => [:free_space, :whitespace, ' ', 8, 9], 4 => [:literal, :literal, 'd', 9, 10], 5 => [:group, :options, '(?-x:', 10, 15], 6 => [:literal, :literal, 'e f', 15, 18], 7 => [:group, :close, ')', 18, 19], 8 => [:literal, :literal, 'g', 19, 20], 9 => [:free_space, :whitespace, ' ', 20, 21], 10 => [:literal, :literal, 'h', 21, 22], 11 => [:group, :close, ')', 22, 23], 12 => [:literal, :literal, 'i j', 23, 26] end describe('scan free space nested groups') do include_examples 'scan', /(a (b(?x: (c d) (?-x:(e f) )g) h)i j)/, 0 => [:group, :capture, '(', 0, 1], 1 => [:literal, :literal, 'a ', 1, 3], 2 => [:group, :capture, '(', 3, 4], 3 => [:literal, :literal, 'b', 4, 5], 4 => [:group, :options, '(?x:', 5, 9], 5 => [:free_space, :whitespace, ' ', 9, 10], 6 => [:group, :capture, '(', 10, 11], 7 => [:literal, :literal, 'c', 11, 12], 8 => [:free_space, :whitespace, ' ', 12, 13], 9 => [:literal, :literal, 'd', 13, 14], 10 => [:group, :close, ')', 14, 15], 11 => [:free_space, :whitespace, ' ', 15, 16], 12 => [:group, :options, '(?-x:', 16, 21], 13 => [:group, :capture, '(', 21, 22], 14 => [:literal, :literal, 'e f', 22, 25], 15 => [:group, :close, ')', 25, 26], 16 => [:literal, :literal, ' ', 26, 27], 17 => [:group, :close, ')', 27, 28], 18 => [:literal, :literal, 'g', 28, 29], 19 => [:group, :close, ')', 29, 30], 20 => [:literal, :literal, ' h', 30, 32], 21 => [:group, :close, ')', 32, 33], 22 => [:literal, :literal, 'i j', 33, 36], 23 => [:group, :close, ')', 36, 37] end describe('scan free space switch groups') do include_examples 'scan', /(a (b((?x) (c d) ((?-x)(e f) )g) h)i j)/, 0 => [:group, :capture, '(', 0, 1], 1 => [:literal, :literal, 'a ', 1, 3], 2 => [:group, :capture, '(', 3, 4], 3 => [:literal, :literal, 'b', 4, 5], 4 => [:group, :capture, '(', 5, 6], 5 => [:group, :options_switch, '(?x', 6, 9], 6 => [:group, :close, ')', 9, 10], 7 => [:free_space, :whitespace, ' ', 10, 11], 8 => [:group, :capture, '(', 11, 12], 9 => [:literal, :literal, 'c', 12, 13], 10 => [:free_space, :whitespace, ' ', 13, 14], 11 => [:literal, :literal, 'd', 14, 15], 12 => [:group, :close, ')', 15, 16], 13 => [:free_space, :whitespace, ' ', 16, 17], 14 => [:group, :capture, '(', 17, 18], 15 => [:group, :options_switch, '(?-x', 18, 22], 16 => [:group, :close, ')', 22, 23], 17 => [:group, :capture, '(', 23, 24], 18 => [:literal, :literal, 'e f', 24, 27], 19 => [:group, :close, ')', 27, 28], 20 => [:literal, :literal, ' ', 28, 29], 21 => [:group, :close, ')', 29, 30], 22 => [:literal, :literal, 'g', 30, 31], 23 => [:group, :close, ')', 31, 32], 24 => [:literal, :literal, ' h', 32, 34], 25 => [:group, :close, ')', 34, 35], 26 => [:literal, :literal, 'i j', 35, 38], 27 => [:group, :close, ')', 38, 39] end end regexp_parser-1.6.0/spec/scanner/meta_spec.rb0000644000004100000410000000160613541126476021300 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Meta scanning') do include_examples 'scan', /abc??|def*+|ghi+/, 0 => [:literal, :literal, 'abc', 0, 3], 1 => [:quantifier, :zero_or_one_reluctant, '??', 3, 5], 2 => [:meta, :alternation, '|', 5, 6], 3 => [:literal, :literal, 'def', 6, 9], 4 => [:quantifier, :zero_or_more_possessive, '*+', 9, 11], 5 => [:meta, :alternation, '|', 11, 12] include_examples 'scan', /(a\|b)|(c|d)\|(e[|]f)/, 2 => [:escape, :alternation, '\|', 2, 4], 5 => [:meta, :alternation, '|', 6, 7], 8 => [:meta, :alternation, '|', 9, 10], 11 => [:escape, :alternation, '\|', 12, 14], 15 => [:literal, :literal, '|', 17, 18] end regexp_parser-1.6.0/spec/scanner/all_spec.rb0000644000004100000410000000075513541126476021126 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Scanner) do specify('scanner returns an array') do expect(RS.scan('abc')).to be_instance_of(Array) end specify('scanner returns tokens as arrays') do tokens = RS.scan('^abc+[^one]{2,3}\\b\\d\\\\C-C$') expect(tokens).to all(be_a Array) expect(tokens.map(&:length)).to all(eq 5) end specify('scanner token count') do re = /^(one|two){2,3}([^d\]efm-qz\,\-]*)(ghi)+$/i expect(RS.scan(re).length).to eq 28 end end regexp_parser-1.6.0/spec/scanner/keep_spec.rb0000644000004100000410000000040513541126476021272 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Keep scanning') do include_examples 'scan', /ab\Kcd/, 1 => [:keep, :mark, '\K', 2, 4] include_examples 'scan', /(a\Kb)|(c\\\Kd)ef/, 2 => [:keep, :mark, '\K', 2, 4], 9 => [:keep, :mark, '\K', 11, 13] end regexp_parser-1.6.0/spec/scanner/sets_spec.rb0000644000004100000410000001632113541126476021330 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Set scanning') do include_examples 'scan', /[a]/, 0 => [:set, :open, '[', 0, 1] include_examples 'scan', /[b]/, 2 => [:set, :close, ']', 2, 3] include_examples 'scan', /[^n]/, 1 => [:set, :negate, '^', 1, 2] include_examples 'scan', /[c]/, 1 => [:literal, :literal, 'c', 1, 2] include_examples 'scan', /[\b]/, 1 => [:escape, :backspace, '\b', 1, 3] include_examples 'scan', /[A\bX]/, 2 => [:escape, :backspace, '\b', 2, 4] include_examples 'scan', /[.]/, 1 => [:literal, :literal, '.', 1, 2] include_examples 'scan', /[?]/, 1 => [:literal, :literal, '?', 1, 2] include_examples 'scan', /[*]/, 1 => [:literal, :literal, '*', 1, 2] include_examples 'scan', /[+]/, 1 => [:literal, :literal, '+', 1, 2] include_examples 'scan', /[{]/, 1 => [:literal, :literal, '{', 1, 2] include_examples 'scan', /[}]/, 1 => [:literal, :literal, '}', 1, 2] include_examples 'scan', /[<]/, 1 => [:literal, :literal, '<', 1, 2] include_examples 'scan', /[>]/, 1 => [:literal, :literal, '>', 1, 2] include_examples 'scan', /[äöü]/, 2 => [:literal, :literal, 'ö', 3, 5] include_examples 'scan', /[\x20]/, 1 => [:escape, :hex, '\x20', 1, 5] include_examples 'scan', '[\.]', 1 => [:escape, :dot, '\.', 1, 3] include_examples 'scan', '[\!]', 1 => [:escape, :literal, '\!', 1, 3] include_examples 'scan', '[\#]', 1 => [:escape, :literal, '\#', 1, 3] include_examples 'scan', '[\\]]', 1 => [:escape, :set_close, '\]', 1, 3] include_examples 'scan', '[\\\\]', 1 => [:escape, :backslash, '\\\\', 1, 3] include_examples 'scan', '[\A]', 1 => [:escape, :literal, '\A', 1, 3] include_examples 'scan', '[\z]', 1 => [:escape, :literal, '\z', 1, 3] include_examples 'scan', '[\g]', 1 => [:escape, :literal, '\g', 1, 3] include_examples 'scan', '[\K]', 1 => [:escape, :literal, '\K', 1, 3] include_examples 'scan', '[\R]', 1 => [:escape, :literal, '\R', 1, 3] include_examples 'scan', '[\X]', 1 => [:escape, :literal, '\X', 1, 3] include_examples 'scan', '[\c2]', 1 => [:escape, :literal, '\c', 1, 3] include_examples 'scan', '[\B]', 1 => [:escape, :literal, '\B', 1, 3] include_examples 'scan', '[a\-c]', 2 => [:escape, :literal, '\-', 2, 4] include_examples 'scan', /[\d]/, 1 => [:type, :digit, '\d', 1, 3] include_examples 'scan', /[\da-z]/, 1 => [:type, :digit, '\d', 1, 3] include_examples 'scan', /[\D]/, 1 => [:type, :nondigit, '\D', 1, 3] include_examples 'scan', /[\h]/, 1 => [:type, :hex, '\h', 1, 3] include_examples 'scan', /[\H]/, 1 => [:type, :nonhex, '\H', 1, 3] include_examples 'scan', /[\s]/, 1 => [:type, :space, '\s', 1, 3] include_examples 'scan', /[\S]/, 1 => [:type, :nonspace, '\S', 1, 3] include_examples 'scan', /[\w]/, 1 => [:type, :word, '\w', 1, 3] include_examples 'scan', /[\W]/, 1 => [:type, :nonword, '\W', 1, 3] include_examples 'scan', /[a-b]/, 1 => [:literal, :literal, 'a', 1, 2] include_examples 'scan', /[a-c]/, 2 => [:set, :range, '-', 2, 3] include_examples 'scan', /[a-d]/, 3 => [:literal, :literal, 'd', 3, 4] include_examples 'scan', /[a-b-]/, 4 => [:literal, :literal, '-', 4, 5] include_examples 'scan', /[-a]/, 1 => [:literal, :literal, '-', 1, 2] include_examples 'scan', /[a-c^]/, 4 => [:literal, :literal, '^', 4, 5] include_examples 'scan', /[a-bd-f]/, 2 => [:set, :range, '-', 2, 3] include_examples 'scan', /[a-cd-f]/, 5 => [:set, :range, '-', 5, 6] include_examples 'scan', /[a[:digit:]c]/, 2 => [:posixclass, :digit, '[:digit:]', 2, 11] include_examples 'scan', /[[:digit:][:space:]]/, 2 => [:posixclass, :space, '[:space:]', 10, 19] include_examples 'scan', /[[:^digit:]]/, 1 => [:nonposixclass, :digit, '[:^digit:]', 1, 11] include_examples 'scan', /[a[.a-b.]c]/, 2 => [:set, :collation, '[.a-b.]', 2, 9] include_examples 'scan', /[a[=e=]c]/, 2 => [:set, :equivalent, '[=e=]', 2, 7] include_examples 'scan', /[a-d&&g-h]/, 4 => [:set, :intersection, '&&', 4, 6] include_examples 'scan', /[a&&]/, 2 => [:set, :intersection, '&&', 2, 4] include_examples 'scan', /[&&z]/, 1 => [:set, :intersection, '&&', 1, 3] include_examples 'scan', /[a\p{digit}c]/, 2 => [:property, :digit, '\p{digit}', 2, 11] include_examples 'scan', /[a\P{digit}c]/, 2 => [:nonproperty, :digit, '\P{digit}', 2, 11] include_examples 'scan', /[a\p{^digit}c]/, 2 => [:nonproperty, :digit, '\p{^digit}', 2, 12] include_examples 'scan', /[a\P{^digit}c]/, 2 => [:property, :digit, '\P{^digit}', 2, 12] include_examples 'scan', /[a\p{ALPHA}c]/, 2 => [:property, :alpha, '\p{ALPHA}', 2, 11] include_examples 'scan', /[a\p{P}c]/, 2 => [:property, :punctuation,'\p{P}', 2, 7] include_examples 'scan', /[a\p{P}\P{P}c]/, 3 => [:nonproperty, :punctuation,'\P{P}', 7, 12] include_examples 'scan', /[\x20-\x27]/, 1 => [:escape, :hex, '\x20', 1, 5], 2 => [:set, :range, '-', 5, 6], 3 => [:escape, :hex, '\x27', 6, 10] include_examples 'scan', /[a-w&&[^c-g]z]/, 5 => [:set, :open, '[', 6, 7], 6 => [:set, :negate, '^', 7, 8], 8 => [:set, :range, '-', 9, 10], 10=> [:set, :close, ']', 11, 12] specify('set literal encoding') do text = RS.scan('[a]')[1][2].to_s expect(text).to eq 'a' expect(text.encoding.to_s).to eq 'UTF-8' text = RS.scan("[\u{1F632}]")[1][2].to_s expect(text).to eq "\u{1F632}" expect(text.encoding.to_s).to eq 'UTF-8' end end regexp_parser-1.6.0/spec/scanner/quantifiers_spec.rb0000644000004100000410000000250113541126476022677 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Quantifier scanning') do include_examples 'scan', 'a?', 1 => [:quantifier, :zero_or_one, '?', 1, 2] include_examples 'scan', 'a??', 1 => [:quantifier, :zero_or_one_reluctant, '??', 1, 3] include_examples 'scan', 'a?+', 1 => [:quantifier, :zero_or_one_possessive, '?+', 1, 3] include_examples 'scan', 'a*', 1 => [:quantifier, :zero_or_more, '*', 1, 2] include_examples 'scan', 'a*?', 1 => [:quantifier, :zero_or_more_reluctant, '*?', 1, 3] include_examples 'scan', 'a*+', 1 => [:quantifier, :zero_or_more_possessive, '*+', 1, 3] include_examples 'scan', 'a+', 1 => [:quantifier, :one_or_more, '+', 1, 2] include_examples 'scan', 'a+?', 1 => [:quantifier, :one_or_more_reluctant, '+?', 1, 3] include_examples 'scan', 'a++', 1 => [:quantifier, :one_or_more_possessive, '++', 1, 3] include_examples 'scan', 'a{2}', 1 => [:quantifier, :interval, '{2}', 1, 4] include_examples 'scan', 'a{2,}', 1 => [:quantifier, :interval, '{2,}', 1, 5] include_examples 'scan', 'a{,2}', 1 => [:quantifier, :interval, '{,2}', 1, 5] include_examples 'scan', 'a{2,4}', 1 => [:quantifier, :interval, '{2,4}', 1, 6] end regexp_parser-1.6.0/spec/scanner/refcalls_spec.rb0000644000004100000410000000453113541126476022145 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('RefCall scanning') do # Traditional numerical group back-reference include_examples 'scan', '(abc)\1' , 3 => [:backref, :number, '\1', 5, 7] # Group back-references, named, numbered, and relative include_examples 'scan', '(?abc)\k', 3 => [:backref, :name_ref_ab, '\k', 9, 14] include_examples 'scan', "(?abc)\\k'X'", 3 => [:backref, :name_ref_sq, "\\k'X'", 9, 14] include_examples 'scan', '(abc)\k<1>', 3 => [:backref, :number_ref_ab, '\k<1>', 5, 10] include_examples 'scan', "(abc)\\k'1'", 3 => [:backref, :number_ref_sq, "\\k'1'", 5, 10] include_examples 'scan', '(abc)\k<-1>', 3 => [:backref, :number_rel_ref_ab, '\k<-1>', 5, 11] include_examples 'scan', "(abc)\\k'-1'", 3 => [:backref, :number_rel_ref_sq, "\\k'-1'", 5, 11] # Sub-expression invocation, named, numbered, and relative include_examples 'scan', '(?abc)\g', 3 => [:backref, :name_call_ab, '\g', 9, 14] include_examples 'scan', "(?abc)\\g'X'", 3 => [:backref, :name_call_sq, "\\g'X'", 9, 14] include_examples 'scan', '(abc)\g<1>', 3 => [:backref, :number_call_ab, '\g<1>', 5, 10] include_examples 'scan', "(abc)\\g'1'", 3 => [:backref, :number_call_sq, "\\g'1'", 5, 10] include_examples 'scan', '(abc)\g<-1>', 3 => [:backref, :number_rel_call_ab, '\g<-1>', 5, 11] include_examples 'scan', "(abc)\\g'-1'", 3 => [:backref, :number_rel_call_sq, "\\g'-1'", 5, 11] include_examples 'scan', '\g<+1>(abc)', 0 => [:backref, :number_rel_call_ab, '\g<+1>', 0, 6] include_examples 'scan', "\\g'+1'(abc)", 0 => [:backref, :number_rel_call_sq, "\\g'+1'", 0, 6] # Group back-references, with recursion level include_examples 'scan', '(?abc)\k', 3 => [:backref, :name_recursion_ref_ab, '\k', 9, 16] include_examples 'scan', "(?abc)\\k'X-0'", 3 => [:backref, :name_recursion_ref_sq, "\\k'X-0'", 9, 16] include_examples 'scan', '(abc)\k<1-0>', 3 => [:backref, :number_recursion_ref_ab, '\k<1-0>', 5, 12] include_examples 'scan', "(abc)\\k'1-0'", 3 => [:backref, :number_recursion_ref_sq, "\\k'1-0'", 5, 12] end regexp_parser-1.6.0/spec/scanner/errors_spec.rb0000644000004100000410000001172113541126476021665 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Scanner) do RSpec.shared_examples 'scan error' do |error, issue, source| it "raises #{error} for #{issue} `#{source}`" do expect { RS.scan(source) }.to raise_error(error) end end include_examples 'scan error', RS::PrematureEndError, 'unbalanced set', '[a' include_examples 'scan error', RS::PrematureEndError, 'unbalanced set', '[[:alpha:]' include_examples 'scan error', RS::PrematureEndError, 'unbalanced group', '(abc' include_examples 'scan error', RS::PrematureEndError, 'unbalanced interval', 'a{1,2' include_examples 'scan error', RS::PrematureEndError, 'eof in property', '\p{asci' include_examples 'scan error', RS::PrematureEndError, 'incomplete property', '\p{ascii abc' include_examples 'scan error', RS::PrematureEndError, 'eof options', '(?mix' include_examples 'scan error', RS::PrematureEndError, 'eof escape', '\\' include_examples 'scan error', RS::PrematureEndError, 'eof in hex escape', '\x' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u0' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u00' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u000' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u{' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u{00' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u{0000' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u{0000 ' include_examples 'scan error', RS::PrematureEndError, 'eof in cp escape', '\u{0000 0000' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\c' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\c\M' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\c\M-' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\C' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\C-' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\C-\M' include_examples 'scan error', RS::PrematureEndError, 'eof in c-seq', '\C-\M-' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M-' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M-\\' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M-\c' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M-\C' include_examples 'scan error', RS::PrematureEndError, 'eof in m-seq', '\M-\C-' include_examples 'scan error', RS::InvalidSequenceError, 'invalid hex', '\xZ' include_examples 'scan error', RS::InvalidSequenceError, 'invalid hex', '\xZ0' include_examples 'scan error', RS::InvalidSequenceError, 'invalid c-seq', '\cü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid c-seq', '\c\M-ü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid c-seq', '\C-ü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid c-seq', '\C-\M-ü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid m-seq', '\M-ü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid m-seq', '\M-\cü' include_examples 'scan error', RS::InvalidSequenceError, 'invalid m-seq', '\M-\C-ü' include_examples 'scan error', RS::ScannerError, 'invalid c-seq', '\Ca' include_examples 'scan error', RS::ScannerError, 'invalid m-seq', '\Ma' include_examples 'scan error', RS::InvalidGroupError, 'invalid group', "(?'')" include_examples 'scan error', RS::InvalidGroupError, 'invalid group', "(?''empty-name)" include_examples 'scan error', RS::InvalidGroupError, 'invalid group', '(?<>)' include_examples 'scan error', RS::InvalidGroupError, 'invalid group', '(?<>empty-name)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid option', '(?foo)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid option', '(?mix abc)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid option', '(?mix^bc' include_examples 'scan error', RS::InvalidGroupOption, 'invalid option', '(?)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid neg option', '(?-foo)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid neg option', '(?-u)' include_examples 'scan error', RS::InvalidGroupOption, 'invalid neg option', '(?-mixu)' include_examples 'scan error', RS::InvalidBackrefError, 'empty backref', '\k<>' include_examples 'scan error', RS::InvalidBackrefError, 'empty backref', '\k\'\'' include_examples 'scan error', RS::InvalidBackrefError, 'empty refcall', '\g<>' include_examples 'scan error', RS::InvalidBackrefError, 'empty refcall', '\g\'\'' include_examples 'scan error', RS::UnknownUnicodePropertyError, 'unknown property', '\p{foobar}' end regexp_parser-1.6.0/spec/scanner/types_spec.rb0000644000004100000410000000144213541126476021514 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Type scanning') do include_examples 'scan', 'a\\dc', 1 => [:type, :digit, '\\d', 1, 3] include_examples 'scan', 'a\\Dc', 1 => [:type, :nondigit, '\\D', 1, 3] include_examples 'scan', 'a\\hc', 1 => [:type, :hex, '\\h', 1, 3] include_examples 'scan', 'a\\Hc', 1 => [:type, :nonhex, '\\H', 1, 3] include_examples 'scan', 'a\\sc', 1 => [:type, :space, '\\s', 1, 3] include_examples 'scan', 'a\\Sc', 1 => [:type, :nonspace, '\\S', 1, 3] include_examples 'scan', 'a\\wc', 1 => [:type, :word, '\\w', 1, 3] include_examples 'scan', 'a\\Wc', 1 => [:type, :nonword, '\\W', 1, 3] include_examples 'scan', 'a\\Rc', 1 => [:type, :linebreak, '\\R', 1, 3] include_examples 'scan', 'a\\Xc', 1 => [:type, :xgrapheme, '\\X', 1, 3] end regexp_parser-1.6.0/spec/scanner/properties_spec.rb0000644000004100000410000000522313541126476022545 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Property scanning') do RSpec.shared_examples 'scan property' do |text, token| it("scans \\p{#{text}} as property #{token}") do result = RS.scan("\\p{#{text}}")[0] expect(result[0..1]).to eq [:property, token] end it("scans \\P{#{text}} as nonproperty #{token}") do result = RS.scan("\\P{#{text}}")[0] expect(result[0..1]).to eq [:nonproperty, token] end it("scans \\p{^#{text}} as nonproperty #{token}") do result = RS.scan("\\p{^#{text}}")[0] expect(result[0..1]).to eq [:nonproperty, token] end it("scans double-negated \\P{^#{text}} as property #{token}") do result = RS.scan("\\P{^#{text}}")[0] expect(result[0..1]).to eq [:property, token] end end include_examples 'scan property', 'Alnum', :alnum include_examples 'scan property', 'XPosixPunct', :xposixpunct include_examples 'scan property', 'Newline', :newline include_examples 'scan property', 'Any', :any include_examples 'scan property', 'Assigned', :assigned include_examples 'scan property', 'Age=1.1', :'age=1.1' include_examples 'scan property', 'Age=10.0', :'age=10.0' include_examples 'scan property', 'ahex', :ascii_hex_digit include_examples 'scan property', 'ASCII_Hex_Digit', :ascii_hex_digit # test underscore include_examples 'scan property', 'sd', :soft_dotted include_examples 'scan property', 'Soft-Dotted', :soft_dotted # test dash include_examples 'scan property', 'Egyp', :egyptian_hieroglyphs include_examples 'scan property', 'Egyptian Hieroglyphs', :egyptian_hieroglyphs # test whitespace include_examples 'scan property', 'Linb', :linear_b include_examples 'scan property', 'Linear-B', :linear_b # test dash include_examples 'scan property', 'InArabic', :in_arabic # test block include_examples 'scan property', 'in Arabic', :in_arabic # test block w. whitespace include_examples 'scan property', 'In_Arabic', :in_arabic # test block w. underscore include_examples 'scan property', 'Yiii', :yi include_examples 'scan property', 'Yi', :yi include_examples 'scan property', 'Zinh', :inherited include_examples 'scan property', 'Inherited', :inherited include_examples 'scan property', 'Qaai', :inherited include_examples 'scan property', 'Zzzz', :unknown include_examples 'scan property', 'Unknown', :unknown end regexp_parser-1.6.0/spec/scanner/groups_spec.rb0000644000004100000410000000713713541126476021676 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Group scanning') do # Group types include_examples 'scan', '(?>abc)', 0 => [:group, :atomic, '(?>', 0, 3] include_examples 'scan', '(abc)', 0 => [:group, :capture, '(', 0, 1] include_examples 'scan', '(?abc)', 0 => [:group, :named_ab, '(?', 0, 8] include_examples 'scan', "(?'name'abc)", 0 => [:group, :named_sq, "(?'name'", 0, 8] include_examples 'scan', '(?abc)', 0 => [:group, :named_ab, '(?', 0,10] include_examples 'scan', "(?'name_1'abc)", 0 => [:group, :named_sq, "(?'name_1'", 0,10] include_examples 'scan', '(?:abc)', 0 => [:group, :passive, '(?:', 0, 3] include_examples 'scan', '(?:)', 0 => [:group, :passive, '(?:', 0, 3] include_examples 'scan', '(?::)', 0 => [:group, :passive, '(?:', 0, 3] # Comments include_examples 'scan', '(?#abc)', 0 => [:group, :comment, '(?#abc)', 0, 7] include_examples 'scan', '(?#)', 0 => [:group, :comment, '(?#)', 0, 4] # Assertions include_examples 'scan', '(?=abc)', 0 => [:assertion, :lookahead, '(?=', 0, 3] include_examples 'scan', '(?!abc)', 0 => [:assertion, :nlookahead, '(?!', 0, 3] include_examples 'scan', '(?<=abc)', 0 => [:assertion, :lookbehind, '(?<=', 0, 4] include_examples 'scan', '(? [:assertion, :nlookbehind, '(? [:group, :options, '(?-mix:', 0, 7] include_examples 'scan', '(?m-ix:abc)', 0 => [:group, :options, '(?m-ix:', 0, 7] include_examples 'scan', '(?mi-x:abc)', 0 => [:group, :options, '(?mi-x:', 0, 7] include_examples 'scan', '(?mix:abc)', 0 => [:group, :options, '(?mix:', 0, 6] include_examples 'scan', '(?m:)', 0 => [:group, :options, '(?m:', 0, 4] include_examples 'scan', '(?i:)', 0 => [:group, :options, '(?i:', 0, 4] include_examples 'scan', '(?x:)', 0 => [:group, :options, '(?x:', 0, 4] include_examples 'scan', '(?mix)', 0 => [:group, :options_switch, '(?mix', 0, 5] include_examples 'scan', '(?d-mix:abc)', 0 => [:group, :options, '(?d-mix:', 0, 8] include_examples 'scan', '(?a-mix:abc)', 0 => [:group, :options, '(?a-mix:', 0, 8] include_examples 'scan', '(?u-mix:abc)', 0 => [:group, :options, '(?u-mix:', 0, 8] include_examples 'scan', '(?da-m:abc)', 0 => [:group, :options, '(?da-m:', 0, 7] include_examples 'scan', '(?du-x:abc)', 0 => [:group, :options, '(?du-x:', 0, 7] include_examples 'scan', '(?dau-i:abc)', 0 => [:group, :options, '(?dau-i:', 0, 8] include_examples 'scan', '(?dau:abc)', 0 => [:group, :options, '(?dau:', 0, 6] include_examples 'scan', '(?d:)', 0 => [:group, :options, '(?d:', 0, 4] include_examples 'scan', '(?a:)', 0 => [:group, :options, '(?a:', 0, 4] include_examples 'scan', '(?u:)', 0 => [:group, :options, '(?u:', 0, 4] include_examples 'scan', '(?dau)', 0 => [:group, :options_switch, '(?dau', 0, 5] if ruby_version_at_least('2.4.1') include_examples 'scan', '(?~abc)', 0 => [:group, :absence, '(?~', 0, 3] end end regexp_parser-1.6.0/spec/scanner/anchors_spec.rb0000644000004100000410000000225713541126476022012 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Anchor scanning') do include_examples 'scan', '^abc', 0 => [:anchor, :bol, '^', 0, 1] include_examples 'scan', 'abc$', 1 => [:anchor, :eol, '$', 3, 4] include_examples 'scan', '\Aabc', 0 => [:anchor, :bos, '\A', 0, 2] include_examples 'scan', 'abc\z', 1 => [:anchor, :eos, '\z', 3, 5] include_examples 'scan', 'abc\Z', 1 => [:anchor, :eos_ob_eol, '\Z', 3, 5] include_examples 'scan', 'a\bc', 1 => [:anchor, :word_boundary, '\b', 1, 3] include_examples 'scan', 'a\Bc', 1 => [:anchor, :nonword_boundary, '\B', 1, 3] include_examples 'scan', 'a\Gc', 1 => [:anchor, :match_start, '\G', 1, 3] include_examples 'scan', "\\\\Ac", 0 => [:escape, :backslash, '\\\\', 0, 2] include_examples 'scan', "a\\\\z", 1 => [:escape, :backslash, '\\\\', 1, 3] include_examples 'scan', "a\\\\Z", 1 => [:escape, :backslash, '\\\\', 1, 3] include_examples 'scan', "a\\\\bc", 1 => [:escape, :backslash, '\\\\', 1, 3] include_examples 'scan', "a\\\\Bc", 1 => [:escape, :backslash, '\\\\', 1, 3] end regexp_parser-1.6.0/spec/scanner/literals_spec.rb0000644000004100000410000000715613541126476022177 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('UTF8 scanning') do # ascii, single byte characters include_examples 'scan', 'a', 0 => [:literal, :literal, 'a', 0, 1] include_examples 'scan', 'ab+', 0 => [:literal, :literal, 'ab', 0, 2] include_examples 'scan', 'ab+', 1 => [:quantifier, :one_or_more, '+', 2, 3] # 2 byte wide characters, Arabic include_examples 'scan', 'aاbبcت', 0 => [:literal, :literal, 'aاbبcت', 0, 9] include_examples 'scan', 'aاbبت?', 0 => [:literal, :literal, 'aاbبت', 0, 8] include_examples 'scan', 'aاbبت?', 1 => [:quantifier, :zero_or_one, '?', 8, 9] include_examples 'scan', 'aا?bبcت+', 0 => [:literal, :literal, 'aا', 0, 3] include_examples 'scan', 'aا?bبcت+', 1 => [:quantifier, :zero_or_one, '?', 3, 4] include_examples 'scan', 'aا?bبcت+', 2 => [:literal, :literal, 'bبcت', 4, 10] include_examples 'scan', 'aا?bبcت+', 3 => [:quantifier, :one_or_more, '+', 10, 11] include_examples 'scan', 'a(اbب+)cت?', 0 => [:literal, :literal, 'a', 0, 1] include_examples 'scan', 'a(اbب+)cت?', 1 => [:group, :capture, '(', 1, 2] include_examples 'scan', 'a(اbب+)cت?', 2 => [:literal, :literal, 'اbب', 2, 7] include_examples 'scan', 'a(اbب+)cت?', 3 => [:quantifier, :one_or_more, '+', 7, 8] include_examples 'scan', 'a(اbب+)cت?', 4 => [:group, :close, ')', 8, 9] include_examples 'scan', 'a(اbب+)cت?', 5 => [:literal, :literal, 'cت', 9, 12] include_examples 'scan', 'a(اbب+)cت?', 6 => [:quantifier, :zero_or_one, '?', 12, 13] # 3 byte wide characters, Japanese include_examples 'scan', 'ab?れます+cd', 0 => [:literal, :literal, 'ab', 0, 2] include_examples 'scan', 'ab?れます+cd', 1 => [:quantifier, :zero_or_one, '?', 2, 3] include_examples 'scan', 'ab?れます+cd', 2 => [:literal, :literal, 'れます', 3, 12] include_examples 'scan', 'ab?れます+cd', 3 => [:quantifier, :one_or_more, '+', 12, 13] include_examples 'scan', 'ab?れます+cd', 4 => [:literal, :literal, 'cd', 13, 15] # 4 byte wide characters, Osmanya include_examples 'scan', '𐒀𐒁?𐒂ab+𐒃', 0 => [:literal, :literal, '𐒀𐒁', 0, 8] include_examples 'scan', '𐒀𐒁?𐒂ab+𐒃', 1 => [:quantifier, :zero_or_one, '?', 8, 9] include_examples 'scan', '𐒀𐒁?𐒂ab+𐒃', 2 => [:literal, :literal, '𐒂ab', 9, 15] include_examples 'scan', '𐒀𐒁?𐒂ab+𐒃', 3 => [:quantifier, :one_or_more, '+', 15, 16] include_examples 'scan', '𐒀𐒁?𐒂ab+𐒃', 4 => [:literal, :literal, '𐒃', 16, 20] include_examples 'scan', 'mu𝄞?si*𝄫c+', 0 => [:literal, :literal, 'mu𝄞', 0, 6] include_examples 'scan', 'mu𝄞?si*𝄫c+', 1 => [:quantifier, :zero_or_one, '?', 6, 7] include_examples 'scan', 'mu𝄞?si*𝄫c+', 2 => [:literal, :literal, 'si', 7, 9] include_examples 'scan', 'mu𝄞?si*𝄫c+', 3 => [:quantifier, :zero_or_more, '*', 9, 10] include_examples 'scan', 'mu𝄞?si*𝄫c+', 4 => [:literal, :literal, '𝄫c', 10, 15] include_examples 'scan', 'mu𝄞?si*𝄫c+', 5 => [:quantifier, :one_or_more, '+', 15, 16] end regexp_parser-1.6.0/spec/syntax/0000755000004100000410000000000013541126476016705 5ustar www-datawww-dataregexp_parser-1.6.0/spec/syntax/versions/0000755000004100000410000000000013541126476020555 5ustar www-datawww-dataregexp_parser-1.6.0/spec/syntax/versions/1.9.1_spec.rb0000644000004100000410000000053213541126476022562 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::V1_9_1) do include_examples 'syntax', Regexp::Syntax.new('ruby/1.9.1'), implements: { escape: T::Escape::Hex + T::Escape::Octal + T::Escape::Unicode, type: T::CharacterType::Hex, quantifier: T::Quantifier::Greedy + T::Quantifier::Reluctant + T::Quantifier::Possessive } end regexp_parser-1.6.0/spec/syntax/versions/2.0.0_spec.rb0000644000004100000410000000047713541126476022561 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::V2_0_0) do include_examples 'syntax', Regexp::Syntax.new('ruby/2.0.0'), implements: { property: T::UnicodeProperty::Age_V2_0_0, nonproperty: T::UnicodeProperty::Age_V2_0_0 }, excludes: { property: [:newline], nonproperty: [:newline] } end regexp_parser-1.6.0/spec/syntax/versions/1.9.3_spec.rb0000644000004100000410000000047613541126476022573 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::V1_9_3) do include_examples 'syntax', Regexp::Syntax.new('ruby/1.9.3'), implements: { property: T::UnicodeProperty::Script_V1_9_3 + T::UnicodeProperty::Age_V1_9_3, nonproperty: T::UnicodeProperty::Script_V1_9_3 + T::UnicodeProperty::Age_V1_9_3 } end regexp_parser-1.6.0/spec/syntax/versions/aliases_spec.rb0000644000004100000410000000355313541126476023543 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax) do RSpec.shared_examples 'syntax alias' do |string, klass| it "aliases #{string} to #{klass}" do syntax = Regexp::Syntax.new(string) expect(syntax).to be_a klass end end include_examples 'syntax alias', 'ruby/1.8.6', Regexp::Syntax::V1_8_6 include_examples 'syntax alias', 'ruby/1.8', Regexp::Syntax::V1_8_6 include_examples 'syntax alias', 'ruby/1.9.1', Regexp::Syntax::V1_9_1 include_examples 'syntax alias', 'ruby/1.9', Regexp::Syntax::V1_9_3 include_examples 'syntax alias', 'ruby/2.0.0', Regexp::Syntax::V1_9 include_examples 'syntax alias', 'ruby/2.0', Regexp::Syntax::V2_0_0 include_examples 'syntax alias', 'ruby/2.1', Regexp::Syntax::V2_0_0 include_examples 'syntax alias', 'ruby/2.2.0', Regexp::Syntax::V2_0_0 include_examples 'syntax alias', 'ruby/2.2.10', Regexp::Syntax::V2_0_0 include_examples 'syntax alias', 'ruby/2.2', Regexp::Syntax::V2_0_0 include_examples 'syntax alias', 'ruby/2.3.0', Regexp::Syntax::V2_3_0 include_examples 'syntax alias', 'ruby/2.3', Regexp::Syntax::V2_3_0 include_examples 'syntax alias', 'ruby/2.4.0', Regexp::Syntax::V2_4_0 include_examples 'syntax alias', 'ruby/2.4.1', Regexp::Syntax::V2_4_1 include_examples 'syntax alias', 'ruby/2.5.0', Regexp::Syntax::V2_4_1 include_examples 'syntax alias', 'ruby/2.5', Regexp::Syntax::V2_5_0 include_examples 'syntax alias', 'ruby/2.6.0', Regexp::Syntax::V2_5_0 include_examples 'syntax alias', 'ruby/2.6.2', Regexp::Syntax::V2_6_2 include_examples 'syntax alias', 'ruby/2.6.3', Regexp::Syntax::V2_6_3 include_examples 'syntax alias', 'ruby/2.6', Regexp::Syntax::V2_6_3 specify('future alias warning') do expect { Regexp::Syntax.new('ruby/5.0') } .to output(/This library .* but you are running .* \(feature set of .*\)/) .to_stderr end end regexp_parser-1.6.0/spec/syntax/versions/1.8.6_spec.rb0000644000004100000410000000117513541126476022572 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::V1_8_6) do include_examples 'syntax', Regexp::Syntax.new('ruby/1.8.6'), implements: { assertion: T::Assertion::Lookahead, backref: [:number], escape: T::Escape::Basic + T::Escape::ASCII + T::Escape::Meta + T::Escape::Control, group: T::Group::V1_8_6, quantifier: T::Quantifier::Greedy + T::Quantifier::Reluctant + T::Quantifier::Interval + T::Quantifier::IntervalReluctant }, excludes: { assertion: T::Assertion::Lookbehind, backref: T::Backreference::All - [:number] + T::SubexpressionCall::All, quantifier: T::Quantifier::Possessive } end regexp_parser-1.6.0/spec/syntax/versions/2.2.0_spec.rb0000644000004100000410000000047613541126476022562 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::V2_2_0) do include_examples 'syntax', Regexp::Syntax.new('ruby/2.2.0'), implements: { property: T::UnicodeProperty::Script_V2_2_0 + T::UnicodeProperty::Age_V2_2_0, nonproperty: T::UnicodeProperty::Script_V2_2_0 + T::UnicodeProperty::Age_V2_2_0 } end regexp_parser-1.6.0/spec/syntax/syntax_spec.rb0000644000004100000410000000317613541126476021601 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax) do specify('unknown name') do expect { Regexp::Syntax.new('ruby/1.0') }.to raise_error(Regexp::Syntax::UnknownSyntaxNameError) end specify('new') do expect(Regexp::Syntax.new('ruby/1.9.3')).to be_instance_of(Regexp::Syntax::V1_9_3) end specify('new any') do expect(Regexp::Syntax.new('any')).to be_instance_of(Regexp::Syntax::Any) expect(Regexp::Syntax.new('*')).to be_instance_of(Regexp::Syntax::Any) end specify('not implemented') do expect { RP.parse('\\p{alpha}', 'ruby/1.8') }.to raise_error(Regexp::Syntax::NotImplementedError) end specify('supported?') do expect(Regexp::Syntax.supported?('ruby/1.1.1')).to be false expect(Regexp::Syntax.supported?('ruby/2.4.3')).to be true expect(Regexp::Syntax.supported?('ruby/2.5')).to be true end specify('invalid version') do expect { Regexp::Syntax.version_class('2.0.0') }.to raise_error(Regexp::Syntax::InvalidVersionNameError) expect { Regexp::Syntax.version_class('ruby/20') }.to raise_error(Regexp::Syntax::InvalidVersionNameError) end specify('version class tiny version') do expect(Regexp::Syntax.version_class('ruby/1.9.3')).to eq Regexp::Syntax::V1_9_3 expect(Regexp::Syntax.version_class('ruby/2.3.1')).to eq Regexp::Syntax::V2_3_1 end specify('version class minor version') do expect(Regexp::Syntax.version_class('ruby/1.9')).to eq Regexp::Syntax::V1_9 expect(Regexp::Syntax.version_class('ruby/2.3')).to eq Regexp::Syntax::V2_3 end specify('raises for unknown constant lookups') do expect { Regexp::Syntax::V1 }.to raise_error(/V1/) end end regexp_parser-1.6.0/spec/syntax/syntax_token_map_spec.rb0000644000004100000410000000115713541126476023633 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Syntax::Token::Map) do let(:map) { Regexp::Syntax::Token::Map } specify('is complete') do latest_syntax = Regexp::Syntax.new('ruby/2.9') latest_syntax.features.each do |type, tokens| tokens.each { |token| expect(map[type]).to include(token) } end end specify('contains no duplicate type/token combinations') do combinations = map.flat_map do |type, tokens| tokens.map { |token| "#{type} #{token}" } end non_uniq = combinations.group_by { |str| str }.select { |_, v| v.count > 1 } expect(non_uniq.keys).to be_empty end end regexp_parser-1.6.0/spec/token/0000755000004100000410000000000013541126476016477 5ustar www-datawww-dataregexp_parser-1.6.0/spec/token/token_spec.rb0000644000004100000410000000363213541126476021162 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Token) do specify('#offset') do regexp = /ab?cd/ tokens = RL.lex(regexp) expect(tokens[1].text).to eq 'b' expect(tokens[1].offset).to eq [1, 2] expect(tokens[2].text).to eq '?' expect(tokens[2].offset).to eq [2, 3] expect(tokens[3].text).to eq 'cd' expect(tokens[3].offset).to eq [3, 5] end specify('#length') do regexp = /abc?def/ tokens = RL.lex(regexp) expect(tokens[0].text).to eq 'ab' expect(tokens[0].length).to eq 2 expect(tokens[1].text).to eq 'c' expect(tokens[1].length).to eq 1 expect(tokens[2].text).to eq '?' expect(tokens[2].length).to eq 1 expect(tokens[3].text).to eq 'def' expect(tokens[3].length).to eq 3 end specify('#to_h') do regexp = /abc?def/ tokens = RL.lex(regexp) expect(tokens[0].text).to eq 'ab' expect(tokens[0].to_h).to eq type: :literal, token: :literal, text: 'ab', ts: 0, te: 2, level: 0, set_level: 0, conditional_level: 0 expect(tokens[2].text).to eq '?' expect(tokens[2].to_h).to eq type: :quantifier, token: :zero_or_one, text: '?', ts: 3, te: 4, level: 0, set_level: 0, conditional_level: 0 end specify('#next') do regexp = /a+b?c*d{2,3}/ tokens = RL.lex(regexp) a = tokens.first expect(a.text).to eq 'a' plus = a.next expect(plus.text).to eq '+' b = plus.next expect(b.text).to eq 'b' interval = tokens.last expect(interval.text).to eq '{2,3}' expect(interval.next).to be_nil end specify('#previous') do regexp = /a+b?c*d{2,3}/ tokens = RL.lex(regexp) interval = tokens.last expect(interval.text).to eq '{2,3}' d = interval.previous expect(d.text).to eq 'd' star = d.previous expect(star.text).to eq '*' c = star.previous expect(c.text).to eq 'c' a = tokens.first expect(a.text).to eq 'a' expect(a.previous).to be_nil end end regexp_parser-1.6.0/spec/lexer/0000755000004100000410000000000013541126476016476 5ustar www-datawww-dataregexp_parser-1.6.0/spec/lexer/escapes_spec.rb0000644000004100000410000000103213541126476021454 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Escape lexing') do include_examples 'lex', '\u{62}', 0 => [:escape, :codepoint_list, '\u{62}', 0, 6, 0, 0, 0] include_examples 'lex', '\u{62 63 64}', 0 => [:escape, :codepoint_list, '\u{62 63 64}', 0, 12, 0, 0, 0] include_examples 'lex', '\u{62 63 64}+', 0 => [:escape, :codepoint_list, '\u{62 63}', 0, 9, 0, 0, 0], 1 => [:escape, :codepoint_list, '\u{64}', 9, 15, 0, 0, 0], 2 => [:quantifier, :one_or_more, '+', 15, 16, 0, 0, 0] end regexp_parser-1.6.0/spec/lexer/conditionals_spec.rb0000644000004100000410000000610713541126476022527 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Conditional lexing') do include_examples 'lex', /(?a)(?()b|c)/, 3 => [:conditional, :open, '(?', 7, 9, 0, 0, 0], 4 => [:conditional, :condition, '()', 9, 14, 0, 0, 1], 6 => [:conditional, :separator, '|', 15, 16, 0, 0, 1], 8 => [:conditional, :close, ')', 17, 18, 0, 0, 0] include_examples 'lex', /((?a)(?(?()b|((?()[e-g]|[h-j])))))/, 0 => [:group, :capture, '(', 0, 1, 0, 0, 0], 1 => [:group, :named, '(?', 1, 6, 1, 0, 0], 5 => [:conditional, :open, '(?', 13, 15, 2, 0, 0], 6 => [:conditional, :condition, '()', 15, 20, 2, 0, 1], 8 => [:conditional, :separator, '|', 21, 22, 2, 0, 1], 10 => [:conditional, :open, '(?', 23, 25, 3, 0, 1], 11 => [:conditional, :condition, '()', 25, 30, 3, 0, 2], 12 => [:set, :open, '[', 30, 31, 3, 0, 2], 13 => [:literal, :literal, 'e', 31, 32, 3, 1, 2], 14 => [:set, :range, '-', 32, 33, 3, 1, 2], 15 => [:literal, :literal, 'g', 33, 34, 3, 1, 2], 16 => [:set, :close, ']', 34, 35, 3, 0, 2], 17 => [:conditional, :separator, '|', 35, 36, 3, 0, 2], 23 => [:conditional, :close, ')', 41, 42, 3, 0, 1], 25 => [:conditional, :close, ')', 43, 44, 2, 0, 0], 26 => [:group, :close, ')', 44, 45, 1, 0, 0], 27 => [:group, :close, ')', 45, 46, 0, 0, 0] include_examples 'lex', /(a(b(c)))(?(1)(?(2)(?(3)d|e))|(?(3)(?(2)f|g)|(?(1)f|g)))/, 9 => [:conditional, :open, '(?', 9, 11, 0, 0, 0], 10 => [:conditional, :condition, '(1)', 11, 14, 0, 0, 1], 11 => [:conditional, :open, '(?', 14, 16, 0, 0, 1], 12 => [:conditional, :condition, '(2)', 16, 19, 0, 0, 2], 13 => [:conditional, :open, '(?', 19, 21, 0, 0, 2], 14 => [:conditional, :condition, '(3)', 21, 24, 0, 0, 3], 16 => [:conditional, :separator, '|', 25, 26, 0, 0, 3], 18 => [:conditional, :close, ')', 27, 28, 0, 0, 2], 19 => [:conditional, :close, ')', 28, 29, 0, 0, 1], 20 => [:conditional, :separator, '|', 29, 30, 0, 0, 1], 21 => [:conditional, :open, '(?', 30, 32, 0, 0, 1], 22 => [:conditional, :condition, '(3)', 32, 35, 0, 0, 2], 23 => [:conditional, :open, '(?', 35, 37, 0, 0, 2], 24 => [:conditional, :condition, '(2)', 37, 40, 0, 0, 3], 26 => [:conditional, :separator, '|', 41, 42, 0, 0, 3], 28 => [:conditional, :close, ')', 43, 44, 0, 0, 2], 29 => [:conditional, :separator, '|', 44, 45, 0, 0, 2], 30 => [:conditional, :open, '(?', 45, 47, 0, 0, 2], 31 => [:conditional, :condition, '(1)', 47, 50, 0, 0, 3], 33 => [:conditional, :separator, '|', 51, 52, 0, 0, 3], 35 => [:conditional, :close, ')', 53, 54, 0, 0, 2], 36 => [:conditional, :close, ')', 54, 55, 0, 0, 1], 37 => [:conditional, :close, ')', 55, 56, 0, 0, 0] end regexp_parser-1.6.0/spec/lexer/all_spec.rb0000644000004100000410000000112713541126476020606 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe(Regexp::Lexer) do specify('lexer returns an array') do expect(RL.lex('abc')).to be_instance_of(Array) end specify('lexer returns tokens') do tokens = RL.lex('^abc+[^one]{2,3}\\b\\d\\\\C-C$') expect(tokens).to all(be_a Regexp::Token) expect(tokens.map { |token| token.to_a.length }).to all(eq 8) end specify('lexer token count') do tokens = RL.lex(/^(one|two){2,3}([^d\]efm-qz\,\-]*)(ghi)+$/i) expect(tokens.length).to eq 28 end specify('lexer scan alias') do expect(RL.scan(/a|b|c/)).to eq RL.lex(/a|b|c/) end end regexp_parser-1.6.0/spec/lexer/keep_spec.rb0000644000004100000410000000043613541126476020764 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Keep lexing') do include_examples 'lex', /ab\Kcd/, 1 => [:keep, :mark, '\K', 2, 4, 0, 0, 0] include_examples 'lex', /(a\Kb)|(c\\\Kd)ef/, 2 => [:keep, :mark, '\K', 2, 4, 1, 0, 0], 9 => [:keep, :mark, '\K', 11, 13, 1, 0, 0] end regexp_parser-1.6.0/spec/lexer/refcalls_spec.rb0000644000004100000410000000467513541126476021644 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('RefCall lexing') do # Traditional numerical group back-reference include_examples 'lex', '(abc)\1', 3 => [:backref, :number, '\1', 5, 7, 0, 0, 0] # Group back-references, named, numbered, and relative include_examples 'lex', '(?abc)\k', 3 => [:backref, :name_ref, '\k', 9, 14, 0, 0, 0] include_examples 'lex', "(?abc)\\k'X'", 3 => [:backref, :name_ref, "\\k'X'", 9, 14, 0, 0, 0] include_examples 'lex', '(abc)\k<1>', 3 => [:backref, :number_ref, '\k<1>', 5, 10, 0, 0, 0] include_examples 'lex', "(abc)\\k'1'", 3 => [:backref, :number_ref, "\\k'1'", 5, 10, 0, 0, 0] include_examples 'lex', '(abc)\k<-1>', 3 => [:backref, :number_rel_ref, '\k<-1>', 5, 11, 0, 0, 0] include_examples 'lex', "(abc)\\k'-1'", 3 => [:backref, :number_rel_ref, "\\k'-1'", 5, 11, 0, 0, 0] # Sub-expression invocation, named, numbered, and relative include_examples 'lex', '(?abc)\g', 3 => [:backref, :name_call, '\g', 9, 14, 0, 0, 0] include_examples 'lex', "(?abc)\\g'X'", 3 => [:backref, :name_call, "\\g'X'", 9, 14, 0, 0, 0] include_examples 'lex', '(abc)\g<1>', 3 => [:backref, :number_call, '\g<1>', 5, 10, 0, 0, 0] include_examples 'lex', "(abc)\\g'1'", 3 => [:backref, :number_call, "\\g'1'", 5, 10, 0, 0, 0] include_examples 'lex', '(abc)\g<-1>', 3 => [:backref, :number_rel_call, '\g<-1>', 5, 11, 0, 0, 0] include_examples 'lex', "(abc)\\g'-1'", 3 => [:backref, :number_rel_call, "\\g'-1'", 5, 11, 0, 0, 0] include_examples 'lex', '(abc)\g<+1>', 3 => [:backref, :number_rel_call, '\g<+1>', 5, 11, 0, 0, 0] include_examples 'lex', "(abc)\\g'+1'", 3 => [:backref, :number_rel_call, "\\g'+1'", 5, 11, 0, 0, 0] # Group back-references, with nesting level include_examples 'lex', '(?abc)\k', 3 => [:backref, :name_recursion_ref, '\k', 9, 16, 0, 0, 0] include_examples 'lex', "(?abc)\\k'X-0'", 3 => [:backref, :name_recursion_ref, "\\k'X-0'", 9, 16, 0, 0, 0] include_examples 'lex', '(abc)\k<1-0>', 3 => [:backref, :number_recursion_ref, '\k<1-0>', 5, 12, 0, 0, 0] include_examples 'lex', "(abc)\\k'1-0'", 3 => [:backref, :number_recursion_ref, "\\k'1-0'", 5, 12, 0, 0, 0] end regexp_parser-1.6.0/spec/lexer/nesting_spec.rb0000644000004100000410000001263013541126476021506 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Nesting lexing') do include_examples 'lex', /(((b)))/, 0 => [:group, :capture, '(', 0, 1, 0, 0, 0], 1 => [:group, :capture, '(', 1, 2, 1, 0, 0], 2 => [:group, :capture, '(', 2, 3, 2, 0, 0], 3 => [:literal, :literal, 'b', 3, 4, 3, 0, 0], 4 => [:group, :close, ')', 4, 5, 2, 0, 0], 5 => [:group, :close, ')', 5, 6, 1, 0, 0], 6 => [:group, :close, ')', 6, 7, 0, 0, 0] include_examples 'lex', /(\((b)\))/, 0 => [:group, :capture, '(', 0, 1, 0, 0, 0], 1 => [:escape, :group_open, '\(', 1, 3, 1, 0, 0], 2 => [:group, :capture, '(', 3, 4, 1, 0, 0], 3 => [:literal, :literal, 'b', 4, 5, 2, 0, 0], 4 => [:group, :close, ')', 5, 6, 1, 0, 0], 5 => [:escape, :group_close, '\)', 6, 8, 1, 0, 0], 6 => [:group, :close, ')', 8, 9, 0, 0, 0] include_examples 'lex', /(?>a(?>b(?>c)))/, 0 => [:group, :atomic, '(?>', 0, 3, 0, 0, 0], 2 => [:group, :atomic, '(?>', 4, 7, 1, 0, 0], 4 => [:group, :atomic, '(?>', 8, 11, 2, 0, 0], 6 => [:group, :close, ')', 12, 13, 2, 0, 0], 7 => [:group, :close, ')', 13, 14, 1, 0, 0], 8 => [:group, :close, ')', 14, 15, 0, 0, 0] include_examples 'lex', /(?:a(?:b(?:c)))/, 0 => [:group, :passive, '(?:', 0, 3, 0, 0, 0], 2 => [:group, :passive, '(?:', 4, 7, 1, 0, 0], 4 => [:group, :passive, '(?:', 8, 11, 2, 0, 0], 6 => [:group, :close, ')', 12, 13, 2, 0, 0], 7 => [:group, :close, ')', 13, 14, 1, 0, 0], 8 => [:group, :close, ')', 14, 15, 0, 0, 0] include_examples 'lex', /(?=a(?!b(?<=c(? [:assertion, :lookahead, '(?=', 0, 3, 0, 0, 0], 2 => [:assertion, :nlookahead, '(?!', 4, 7, 1, 0, 0], 4 => [:assertion, :lookbehind, '(?<=', 8, 12, 2, 0, 0], 6 => [:assertion, :nlookbehind, '(? [:group, :close, ')', 18, 19, 3, 0, 0], 9 => [:group, :close, ')', 19, 20, 2, 0, 0], 10 => [:group, :close, ')', 20, 21, 1, 0, 0], 11 => [:group, :close, ')', 21, 22, 0, 0, 0] include_examples 'lex', /((?#a)b(?#c)d(?#e))/, 0 => [:group, :capture, '(', 0, 1, 0, 0, 0], 1 => [:group, :comment, '(?#a)', 1, 6, 1, 0, 0], 3 => [:group, :comment, '(?#c)', 7, 12, 1, 0, 0], 5 => [:group, :comment, '(?#e)', 13, 18, 1, 0, 0], 6 => [:group, :close, ')', 18, 19, 0, 0, 0] include_examples 'lex', /a[b-e]f/, 1 => [:set, :open, '[', 1, 2, 0, 0, 0], 2 => [:literal, :literal, 'b', 2, 3, 0, 1, 0], 3 => [:set, :range, '-', 3, 4, 0, 1, 0], 4 => [:literal, :literal, 'e', 4, 5, 0, 1, 0], 5 => [:set, :close, ']', 5, 6, 0, 0, 0] include_examples 'lex', /[[:word:]&&[^c]z]/, 0 => [:set, :open, '[', 0, 1, 0, 0, 0], 1 => [:posixclass, :word, '[:word:]', 1, 9, 0, 1, 0], 2 => [:set, :intersection, '&&', 9, 11, 0, 1, 0], 3 => [:set, :open, '[', 11, 12, 0, 1, 0], 4 => [:set, :negate, '^', 12, 13, 0, 2, 0], 5 => [:literal, :literal, 'c', 13, 14, 0, 2, 0], 6 => [:set, :close, ']', 14, 15, 0, 1, 0], 7 => [:literal, :literal, 'z', 15, 16, 0, 1, 0], 8 => [:set, :close, ']', 16, 17, 0, 0, 0] include_examples 'lex', /[\p{word}&&[^c]z]/, 0 => [:set, :open, '[', 0, 1, 0, 0, 0], 1 => [:property, :word, '\p{word}', 1, 9, 0, 1, 0], 2 => [:set, :intersection, '&&', 9, 11, 0, 1, 0], 3 => [:set, :open, '[', 11, 12, 0, 1, 0], 4 => [:set, :negate, '^', 12, 13, 0, 2, 0], 5 => [:literal, :literal, 'c', 13, 14, 0, 2, 0], 6 => [:set, :close, ']', 14, 15, 0, 1, 0], 7 => [:literal, :literal, 'z', 15, 16, 0, 1, 0], 8 => [:set, :close, ']', 16, 17, 0, 0, 0] include_examples 'lex', /[a[b[c[d-g]]]]/, 0 => [:set, :open, '[', 0, 1, 0, 0, 0], 1 => [:literal, :literal, 'a', 1, 2, 0, 1, 0], 2 => [:set, :open, '[', 2, 3, 0, 1, 0], 3 => [:literal, :literal, 'b', 3, 4, 0, 2, 0], 4 => [:set, :open, '[', 4, 5, 0, 2, 0], 5 => [:literal, :literal, 'c', 5, 6, 0, 3, 0], 6 => [:set, :open, '[', 6, 7, 0, 3, 0], 7 => [:literal, :literal, 'd', 7, 8, 0, 4, 0], 8 => [:set, :range, '-', 8, 9, 0, 4, 0], 9 => [:literal, :literal, 'g', 9, 10, 0, 4, 0], 10 => [:set, :close, ']', 10, 11, 0, 3, 0], 11 => [:set, :close, ']', 11, 12, 0, 2, 0], 12 => [:set, :close, ']', 12, 13, 0, 1, 0], 13 => [:set, :close, ']', 13, 14, 0, 0, 0] end regexp_parser-1.6.0/spec/lexer/literals_spec.rb0000644000004100000410000001014113541126476021651 0ustar www-datawww-datarequire 'spec_helper' RSpec.describe('Literal lexing') do # ascii, single byte characters include_examples 'lex', 'a', 0 => [:literal, :literal, 'a', 0, 1, 0, 0, 0] include_examples 'lex', 'ab+', 0 => [:literal, :literal, 'a', 0, 1, 0, 0, 0], 1 => [:literal, :literal, 'b', 1, 2, 0, 0, 0], 2 => [:quantifier, :one_or_more, '+', 2, 3, 0, 0, 0] # 2 byte wide characters, Arabic include_examples 'lex', 'ا', 0 => [:literal, :literal, 'ا', 0, 2, 0, 0, 0] include_examples 'lex', 'aاbبcت', 0 => [:literal, :literal, 'aاbبcت', 0, 9, 0, 0, 0] include_examples 'lex', 'aاbبت?', 0 => [:literal, :literal, 'aاbب', 0, 6, 0, 0, 0], 1 => [:literal, :literal, 'ت', 6, 8, 0, 0, 0], 2 => [:quantifier, :zero_or_one, '?', 8, 9, 0, 0, 0] include_examples 'lex', 'aا?bبcت+', 0 => [:literal, :literal, 'a', 0, 1, 0, 0, 0], 1 => [:literal, :literal, 'ا', 1, 3, 0, 0, 0], 2 => [:quantifier, :zero_or_one, '?', 3, 4, 0, 0, 0], 3 => [:literal, :literal, 'bبc', 4, 8, 0, 0, 0], 4 => [:literal, :literal, 'ت', 8, 10, 0, 0, 0], 5 => [:quantifier, :one_or_more, '+', 10, 11, 0, 0, 0] include_examples 'lex', 'a(اbب+)cت?', 0 => [:literal, :literal, 'a', 0, 1, 0, 0, 0], 1 => [:group, :capture, '(', 1, 2, 0, 0, 0], 2 => [:literal, :literal, 'اb', 2, 5, 1, 0, 0], 3 => [:literal, :literal, 'ب', 5, 7, 1, 0, 0], 4 => [:quantifier, :one_or_more, '+', 7, 8, 1, 0, 0], 5 => [:group, :close, ')', 8, 9, 0, 0, 0], 6 => [:literal, :literal, 'c', 9, 10, 0, 0, 0], 7 => [:literal, :literal, 'ت', 10, 12, 0, 0, 0], 8 => [:quantifier, :zero_or_one, '?', 12, 13, 0, 0, 0] # 3 byte wide characters, Japanese include_examples 'lex', 'ab?れます+cd', 0 => [:literal, :literal, 'a', 0, 1, 0, 0, 0], 1 => [:literal, :literal, 'b', 1, 2, 0, 0, 0], 2 => [:quantifier, :zero_or_one, '?', 2, 3, 0, 0, 0], 3 => [:literal, :literal, 'れま', 3, 9, 0, 0, 0], 4 => [:literal, :literal, 'す', 9, 12, 0, 0, 0], 5 => [:quantifier, :one_or_more, '+', 12, 13, 0, 0, 0], 6 => [:literal, :literal, 'cd', 13, 15, 0, 0, 0] # 4 byte wide characters, Osmanya include_examples 'lex', '𐒀𐒁?𐒂ab+𐒃', 0 => [:literal, :literal, '𐒀', 0, 4, 0, 0, 0], 1 => [:literal, :literal, '𐒁', 4, 8, 0, 0, 0], 2 => [:quantifier, :zero_or_one, '?', 8, 9, 0, 0, 0], 3 => [:literal, :literal, '𐒂a', 9, 14, 0, 0, 0], 4 => [:literal, :literal, 'b', 14, 15, 0, 0, 0], 5 => [:quantifier, :one_or_more, '+', 15, 16, 0, 0, 0], 6 => [:literal, :literal, '𐒃', 16, 20, 0, 0, 0] include_examples 'lex', 'mu𝄞?si*𝄫c+', 0 => [:literal, :literal, 'mu', 0, 2, 0, 0, 0], 1 => [:literal, :literal, '𝄞', 2, 6, 0, 0, 0], 2 => [:quantifier, :zero_or_one, '?', 6, 7, 0, 0, 0], 3 => [:literal, :literal, 's', 7, 8, 0, 0, 0], 4 => [:literal, :literal, 'i', 8, 9, 0, 0, 0], 5 => [:quantifier, :zero_or_more, '*', 9, 10, 0, 0, 0], 6 => [:literal, :literal, '𝄫', 10, 14, 0, 0, 0], 7 => [:literal, :literal, 'c', 14, 15, 0, 0, 0], 8 => [:quantifier, :one_or_more, '+', 15, 16, 0, 0, 0] specify('lex single 2 byte char') do tokens = RL.lex("\u0627+") expect(tokens.count).to eq 2 end specify('lex single 3 byte char') do tokens = RL.lex("\u308C+") expect(tokens.count).to eq 2 end specify('lex single 4 byte char') do tokens = RL.lex("\u{1D11E}+") expect(tokens.count).to eq 2 end end regexp_parser-1.6.0/spec/spec_helper.rb0000644000004100000410000000053113541126476020174 0ustar www-datawww-datarequire 'regexp_parser' require 'regexp_property_values' require_relative 'support/shared_examples' RS = Regexp::Scanner RL = Regexp::Lexer RP = Regexp::Parser RE = Regexp::Expression T = Regexp::Syntax::Token include Regexp::Expression def ruby_version_at_least(version) Gem::Version.new(RUBY_VERSION.dup) >= Gem::Version.new(version) end regexp_parser-1.6.0/spec/support/0000755000004100000410000000000013541126476017073 5ustar www-datawww-dataregexp_parser-1.6.0/spec/support/runner.rb0000644000004100000410000000136313541126476020734 0ustar www-datawww-datarequire 'pathname' require 'rspec' module RegexpParserSpec class Runner def initialize(arguments, warning_whitelist) @arguments = arguments @warning_whitelist = warning_whitelist end def run spec_status = nil Warning::Filter.new(warning_whitelist).assert_expected_warnings_only do setup spec_status = run_rspec end spec_status end private def setup $VERBOSE = true spec_files.each(&method(:require)) end def run_rspec RSpec::Core::Runner.run([]) end def spec_files arguments .map { |path| Pathname.new(path).expand_path.freeze } .select(&:file?) end attr_reader :arguments, :warning_whitelist end end regexp_parser-1.6.0/spec/support/warning_extractor.rb0000644000004100000410000000230413541126476023157 0ustar www-datawww-datarequire 'set' require 'delegate' module RegexpParserSpec class Warning class UnexpectedWarnings < StandardError MSG = 'Unexpected warnings: %s'.freeze def initialize(warnings) super(MSG % warnings.join("\n")) end end class Filter def initialize(whitelist) @whitelist = whitelist end def assert_expected_warnings_only original = $stderr $stderr = Extractor.new(original, @whitelist) yield assert_no_warnings($stderr.warnings) ensure $stderr = original end private def assert_no_warnings(warnings) raise UnexpectedWarnings, warnings.to_a if warnings.any? end end class Extractor < DelegateClass(IO) PATTERN = /\A(?:.+):(?:\d+): warning: (?:.+)\n\z/ def initialize(io, whitelist) @whitelist = whitelist @warnings = Set.new super(io) end def write(message) return super if PATTERN !~ message warning = message.chomp @warnings << warning if @whitelist.none?(&warning.method(:include?)) self end def warnings @warnings.dup.freeze end end end end regexp_parser-1.6.0/spec/support/shared_examples.rb0000644000004100000410000000447313541126476022574 0ustar www-datawww-dataRSpec.shared_examples 'syntax' do |klass, opts| opts[:implements].each do |type, tokens| tokens.each do |token| it("implements #{token} #{type}") do expect(klass.implements?(type, token)).to be true end end end opts[:excludes] && opts[:excludes].each do |type, tokens| tokens.each do |token| it("does not implement #{token} #{type}") do expect(klass.implements?(type, token)).to be false end end end end RSpec.shared_examples 'scan' do |pattern, checks| context "given the pattern #{pattern}" do before(:all) { @tokens = Regexp::Scanner.scan(pattern) } checks.each do |index, (type, token, text, ts, te)| it "scans token #{index} as #{token} #{type} at #{ts}..#{te}" do result = @tokens.at(index) expect(result[0]).to eq type expect(result[1]).to eq token expect(result[2]).to eq text expect(result[3]).to eq ts expect(result[4]).to eq te end end end end RSpec.shared_examples 'lex' do |pattern, checks| context "given the pattern #{pattern}" do before(:all) { @tokens = Regexp::Lexer.lex(pattern) } checks.each do |index, (type, token, text, ts, te, lvl, set_lvl, cond_lvl)| it "lexes token #{index} as #{token} #{type} at #{lvl}, #{set_lvl}, #{cond_lvl}" do struct = @tokens.at(index) expect(struct.type).to eq type expect(struct.token).to eq token expect(struct.text).to eq text expect(struct.ts).to eq ts expect(struct.te).to eq te expect(struct.level).to eq lvl expect(struct.set_level).to eq set_lvl expect(struct.conditional_level).to eq cond_lvl end end end end RSpec.shared_examples 'parse' do |pattern, checks| context "given the pattern #{pattern}" do before(:all) { @root = Regexp::Parser.parse(pattern, '*') } checks.each do |path, (type, token, klass, attributes)| it "parses expression at #{path} as #{klass}" do exp = @root.dig(*path) expect(exp).to be_instance_of(klass) expect(exp.type).to eq type expect(exp.token).to eq token attributes && attributes.each do |method, value| expect(exp.send(method)).to eq(value), "expected expression at #{path} to have #{method} #{value}" end end end end end regexp_parser-1.6.0/CHANGELOG.md0000644000004100000410000003666613541126475016256 0ustar www-datawww-data## [Unreleased] ### [1.6.0] - 2019-06-16 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - Added support for 16 new unicode properties introduced in Ruby 2.6.2 and 2.6.3 ### [1.5.1] - 2019-05-23 - [Janosch Müller](mailto:janosch84@gmail.com) ### Fixed - Fixed `#options` (and thus `#i?`, `#u?` etc.) not being set for some expressions: * this affected posix classes as well as alternation, conditional, and intersection branches * `#options` was already correct for all child expressions of such branches * this only made an operational difference for posix classes as they respect encoding flags - Fixed `#options` not respecting all negative options in weird cases like '(?u-m-x)' - Fixed `Group#option_changes` not accounting for indirectly disabled (overridden) encoding flags - Fixed `Scanner` allowing negative encoding options if there were no positive options, e.g. '(?-u)' - Fixed `ScannerError` for some valid meta/control sequences such as '\\C-\\\\' - Fixed `Expression#match` and `#=~` not working with a single argument ### [1.5.0] - 2019-05-14 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - Added `#referenced_expression` for backrefs, subexp calls and conditionals * returns the `Group` expression that is being referenced via name or number - Added `Expression#repetitions` * returns a `Range` of allowed repetitions (`1..1` if there is no quantifier) * like `#quantity` but with a more uniform interface - Added `Expression#match_length` * allows to inspect and iterate over String lengths matched by the Expression ### Fixed - Fixed `Expression#clone` "direction" * it used to dup ivars onto the callee, leaving only the clone referencing the original objects * this will affect you if you call `#eql?`/`#equal?` on expressions or use them as Hash keys - Fixed `#clone` results for `Sequences`, e.g. alternations and conditionals * the inner `#text` was cloned onto the `Sequence` and thus duplicated * e.g. `Regexp::Parser.parse(/(a|bc)/).clone.to_s # => (aa|bcbc)` - Fixed inconsistent `#to_s` output for `Sequences` * it used to return only the "specific" text, e.g. "|" for an alternation * now it includes nested expressions as it does for all other `Subexpressions` - Fixed quantification of codepoint lists with more than one entry (`\u{62 63 64}+`) * quantifiers apply only to the last entry, so this token is now split up if quantified ### [1.4.0] - 2019-04-02 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - Added support for 19 new unicode properties introduced in Ruby 2.6.0 ### [1.3.0] - 2018-11-14 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - `Syntax#features` returns a `Hash` of all types and tokens supported by a given `Syntax` ### Fixed - Thanks to [Akira Matsuda](https://github.com/amatsuda) * eliminated warning "assigned but unused variable - testEof" ## [1.2.0] - 2018-09-28 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - `Subexpression` (branch node) includes `Enumerable`, allowing to `#select` children etc. ### Fixed - Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re` - `Conditional::Condition` no longer lives outside the recursive `#expressions` tree - it used to be the only expression stored in a custom ivar, complicating traversal - its setter and getter (`#condition=`, `#condition`) still work as before ## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com) ### Added - Added `Quantifier` methods `#greedy?`, `#possessive?`, `#reluctant?`/`#lazy?` - Added `Group::Options#option_changes` - shows the options enabled or disabled by the given options group - as with all other expressions, `#options` shows the overall active options - Added `Conditional#reference` and `Condition#reference`, indicating the determinative group - Added `Subexpression#dig`, acts like [`Array#dig`](http://ruby-doc.org/core-2.5.0/Array.html#method-i-dig) ### Fixed - Fixed parsing of quantified conditional expressions (quantifiers were assigned to the wrong expression) - Fixed scanning and parsing of forward-referring subexpression calls (e.g. `\g<+1>`) - `Root` and `Sequence` expressions now support the same constructor signature as all other expressions ## [1.0.0] - 2018-09-01 - [Janosch Müller](mailto:janosch84@gmail.com) This release includes several breaking changes, mostly to character sets, #map and properties. ### Changed - Changed handling of sets (a.k.a. character classes or "bracket expressions") * see PR [#55](https://github.com/ammar/regexp_parser/pull/55) / issue [#47](https://github.com/ammar/regexp_parser/issues/47) for details * sets are now parsed to expression trees like other nestable expressions * `#scan` now emits the same tokens as outside sets (no longer `:set, :member`) * `CharacterSet#members` has been removed * new `Range` and `Intersection` classes represent corresponding syntax features * a new `PosixClass` expression class represents e.g. `[[:ascii:]]` * `PosixClass` instances behave like `Property` ones, e.g. support `#negative?` * `#scan` emits `:(non)posixclass, :` instead of `:set, :char_(non)` - Changed `Subexpression#map` to act like regular `Enumerable#map` * the old behavior is available as `Subexpression#flat_map` * e.g. `parse(/[a]/).map(&:to_s) == ["[a]"]`; used to be `["[a]", "a"]` - Changed expression emissions for some escape sequences * `EscapeSequence::Codepoint`, `CodepointList`, `Hex` and `Octal` are now all used * they already existed, but were all parsed as `EscapeSequence::Literal` * e.g. `\x97` is now `EscapeSequence::Hex` instead of `EscapeSequence::Literal` - Changed naming of many property tokens (emitted for `\p{...}`) * if you work with these tokens, see PR [#56](https://github.com/ammar/regexp_parser/pull/56) for details * e.g. `:punct_dash` is now `:dash_punctuation` - Changed `(?m)` and the likes to emit as `:options_switch` token (@4ade4d1) * allows differentiating from group-local `:options`, e.g. `(?m:.)` - Changed name of `Backreference::..NestLevel` to `..RecursionLevel` (@4184339) - Changed `Backreference::Number#number` from `String` to `Integer` (@40a2231) ### Added - Added support for all previously missing properties (about 250) - Added `Expression::UnicodeProperty#shortcut` (e.g. returns "m" for `\p{mark}`) - Added `#char(s)` and `#codepoint(s)` methods to all `EscapeSequence` expressions - Added `#number`/`#name`/`#recursion_level` to all backref/call expressions (@174bf21) - Added `#number` and `#number_at_level` to capturing group expressions (@40a2231) ### Fixed - Fixed Ruby version mapping of some properties - Fixed scanning of some property spellings, e.g. with dashes - Fixed some incorrect property alias normalizations - Fixed scanning of codepoint escapes with 6 digits (e.g. `\u{10FFFF}`) - Fixed scanning of `\R` and `\X` within sets; they act as literals there ## [0.5.0] - 2018-04-29 - [Janosch Müller](mailto:janosch84@gmail.com) ### Changed - Changed handling of Ruby versions (PR [#53](https://github.com/ammar/regexp_parser/pull/53)) * New Ruby versions are now supported by default * Some deep-lying APIs have changed, which should not affect most users: * `Regexp::Syntax::VERSIONS` is gone * Syntax version names have changed from `Regexp::Syntax::Ruby::Vnnn` to `Regexp::Syntax::Vn_n_n` * Syntax version classes for Ruby versions without regex feature changes are no longer predefined and are now only created on demand / lazily * `Regexp::Syntax::supported?` returns true for any argument >= 1.8.6 ### Fixed - Fixed some use cases of Expression methods #strfregexp and #to_h (@e738107) ### Added - Added full signature support to collection methods of Expressions (@aa7c55a) ## [0.4.13] - 2018-04-04 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Added ruby version files for 2.2.10 and 2.3.7 ## [0.4.12] - 2018-03-30 - [Janosch Müller](mailto:janosch84@gmail.com) - Added ruby version files for 2.4.4 and 2.5.1 ## [0.4.11] - 2018-03-04 - [Janosch Müller](mailto:janosch84@gmail.com) - Fixed UnknownSyntaxNameError introduced in v0.4.10 if the gems parent dir tree included a 'ruby' dir ## [0.4.10] - 2018-03-04 - [Janosch Müller](mailto:janosch84@gmail.com) - Added ruby version file for 2.6.0 - Added support for Emoji properties (available in Ruby since 2.5.0) - Added support for XPosixPunct and Regional_Indicator properties - Fixed parsing of Unicode 6.0 and 7.0 script properties - Fixed parsing of the special Assigned property - Fixed scanning of InCyrillic_Supplement property ## [0.4.9] - 2017-12-25 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Added ruby version file for 2.5.0 ## [0.4.8] - 2017-12-18 - [Janosch Müller](mailto:janosch84@gmail.com) - Added ruby version files for 2.2.9, 2.3.6, and 2.4.3 ## [0.4.7] - 2017-10-15 - [Janosch Müller](mailto:janosch84@gmail.com) - Fixed a thread safety issue (issue #45) - Some public class methods that were only reliable for internal use are now private instance methods (PR #46) - Improved the usefulness of Expression#options (issue #43) - #options and derived methods such as #i?, #m? and #x? are now defined for all Expressions that are affected by such flags. - Fixed scanning of whitespace following (?x) (commit 5c94bd2) - Fixed a Parser bug where the #number attribute of traditional numerical backreferences was not set correctly (commit 851b620) ## [0.4.6] - 2017-09-18 - [Janosch Müller](mailto:janosch84@gmail.com) - Added Parser support for hex escapes in sets (PR #36) - Added Parser support for octal escapes (PR #37) - Added support for cluster types \R and \X (PR #38) - Added support for more metacontrol notations (PR #39) ## [0.4.5] - 2017-09-17 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [Janosch Müller](https://github.com/janosch-x): * Support ruby 2.2.7 (PR #42) - Added ruby version files for 2.2.8, 2.3.5, and 2.4.2 ## [0.4.4] - 2017-07-10 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [Janosch Müller](https://github.com/janosch-x): * Add support for new absence operator (PR #33) - Thanks to [Bartek Bułat](https://github.com/barthez): * Add support for Ruby 2.3.4 version (PR #40) ## [0.4.3] - 2017-03-24 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Added ruby version file for 2.4.1 ## [0.4.2] - 2017-01-10 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [Janosch Müller](https://github.com/janosch-x): * Support ruby 2.4 (PR #30) * Improve codepoint handling (PR #27) ## [0.4.1] - 2016-11-22 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Updated ruby version file for 2.3.3 ## [0.4.0] - 2016-11-20 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Added Syntax.supported? method - Updated ruby versions for latest releases; 2.1.10, 2.2.6, and 2.3.2 ## [0.3.6] - 2016-06-08 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [John Backus](https://github.com/backus): * Remove warnings (PR #26) ## [0.3.5] - 2016-05-30 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [John Backus](https://github.com/backus): * Fix parsing of /\xFF/n (hex:escape) (PR #24) ## [0.3.4] - 2016-05-25 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [John Backus](https://github.com/backus): * Fix warnings (PR #19) - Thanks to [Dana Scheider](https://github.com/danascheider): * Correct error in README (PR #20) - Fixed mistyped \h and \H character types (issue #21) - Added ancestry syntax files for latest rubies (issue #22) ## [0.3.3] - 2016-04-26 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Thanks to [John Backus](https://github.com/backus): * Fixed scanning of zero length comments (PR #12) * Fixed missing escape:codepoint_list syntax token (PR #14) * Fixed to_s for modified interval quantifiers (PR #17) - Added a note about MRI implementation quirks to Scanner section ## [0.3.2] - 2016-01-01 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Updated ruby versions for latest releases; 2.1.8, 2.2.4, and 2.3.0 - Fixed class name for UnknownSyntaxNameError exception - Added UnicodeBlocks support to the parser. - Added UnicodeBlocks support to the scanner. - Added expand_members method to CharacterSet, returns traditional or unicode property forms of shothands (\d, \W, \s, etc.) - Improved meaning and output of %t and %T in strfregexp. - Added syntax versions for ruby 2.1.4 and 2.1.5 and updated latest 2.1 version. - Added to_h methods to Expression, Subexpression, and Quantifier. - Added traversal methods; traverse, each_expression, and map. - Added token/type test methods; type?, is?, and one_of? - Added printing method strfregexp, inspired by strftime. - Added scanning and parsing of free spacing (x mode) expressions. - Improved handling of inline options (?mixdau:...) - Added conditional expressions. Ruby 2.0. - Added keep (\K) markers. Ruby 2.0. - Added d, a, and u options. Ruby 2.0. - Added missing meta sequences to the parser. They were supported by the scanner only. - Renamed Lexer's method to lex, added an alias to the old name (scan) - Use #map instead of #each to run the block in Lexer.lex. - Replaced VERSION.yml file with a constant. - Updated README - Update tokens and scanner with new additions in Unicode 7.0. ## [0.1.6] - 2014-10-06 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Fixed test and gem building rake tasks and extracted the gem specification from the Rakefile into a .gemspec file. - Added syntax files for missing ruby 2.x versions. These do not add extra syntax support, they just make the gem work with the newer ruby versions. - Added .travis.yml to project root. - README: - Removed note purporting runtime support for ruby 1.8.6. - Added a section identifying the main unsupported syntax features. - Added sections for Testing and Building - Added badges for gem version, Travis CI, and code climate. - Updated README, fixing broken examples, and converting it from a rdoc file to Github's flavor of Markdown. - Fixed a parser bug where an alternation sequence that contained nested expressions was incorrectly being appended to the parent expression when the nesting was exited. e.g. in /a|(b)c/, c was appended to the root. - Fixed a bug where character types were not being correctly scanned within character sets. e.g. in [\d], two tokens were scanned; one for the backslash '\' and one for the 'd' ## [0.1.5] - 2014-01-14 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Correct ChangeLog. - Added syntax stubs for ruby versions 2.0 and 2.1 - Added clone methods for deep copying expressions. - Added optional format argument for to_s on expressions to return the text of the expression with (:full, the default) or without (:base) its quantifier. - Renamed the :beginning_of_line and :end_of_line tokens to :bol and :eol. - Fixed a bug where alternations with more than two alternatives and one of them ending in a group were being incorrectly nested. - Improved EOF handling in general and especially from sequences like hex and control escapes. - Fixed a bug where named groups with an empty name would return a blank token []. - Fixed a bug where member of a parent set where being added to its last subset. - Various code cleanups in scanner.rl - Fixed a few mutable string bugs by calling dup on the originals. - Made ruby 1.8.6 the base for all 1.8 syntax, and the 1.8 name a pointer to the latest (1.8.7 at this time) - Removed look-behind assertions (positive and negative) from 1.8 syntax - Added control (\cc and \C-c) and meta (\M-c) escapes to 1.8 syntax - The default syntax is now the one of the running ruby version in both the lexer and the parser. ## [0.1.0] - 2010-11-21 - [Ammar Ali](mailto:ammarabuali@gmail.com) - Initial release regexp_parser-1.6.0/LICENSE0000644000004100000410000000205213541126475015430 0ustar www-datawww-dataCopyright (c) 2010, 2012-2015, Ammar Ali Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. regexp_parser-1.6.0/Rakefile0000644000004100000410000000454213541126475016076 0ustar www-datawww-datarequire 'rubygems' require 'rake' require 'rake/testtask' require 'bundler' require 'rubygems/package_task' RAGEL_SOURCE_DIR = File.expand_path '../lib/regexp_parser/scanner', __FILE__ RAGEL_OUTPUT_DIR = File.expand_path '../lib/regexp_parser', __FILE__ RAGEL_SOURCE_FILES = %w{scanner} # scanner.rl includes property.rl Bundler::GemHelper.install_tasks task :default => [:'test:full'] namespace :test do task full: :'ragel:rb' do sh 'bin/test' end end namespace :ragel do desc "Process the ragel source files and output ruby code" task :rb do |t| RAGEL_SOURCE_FILES.each do |file| output_file = "#{RAGEL_OUTPUT_DIR}/#{file}.rb" # using faster flat table driven FSM, about 25% larger code, but about 30% faster sh "ragel -F1 -R #{RAGEL_SOURCE_DIR}/#{file}.rl -o #{output_file}" contents = File.read(output_file) File.open(output_file, 'r+') do |file| contents = "# -*- warn-indent:false; -*-\n" + contents file.write(contents) end end end desc "Delete the ragel generated source file(s)" task :clean do |t| RAGEL_SOURCE_FILES.each do |file| sh "rm -f #{RAGEL_OUTPUT_DIR}/#{file}.rb" end end end # Add ragel task as a prerequisite for building the gem to ensure that the # latest scanner code is generated and included in the build. desc "Runs ragel:rb before building the gem" task :build => ['ragel:rb'] namespace :props do desc 'Write new property value hashes for the properties scanner' task :update do require 'regexp_property_values' RegexpPropertyValues.update dir = File.expand_path('../lib/regexp_parser/scanner/properties', __FILE__) require 'psych' write_hash_to_file = ->(hash, path) do File.open(path, 'w') do |f| f.puts '#', "# THIS FILE IS AUTO-GENERATED BY `rake props:update`, DO NOT EDIT", '#', hash.sort.to_h.to_yaml end puts "Wrote #{hash.count} aliases to `#{path}`" end long_names_to_tokens = RegexpPropertyValues.all.map do |val| [val.identifier, val.full_name.downcase] end write_hash_to_file.call(long_names_to_tokens, "#{dir}/long.yml") short_names_to_tokens = RegexpPropertyValues.alias_hash.map do |k, v| [k.identifier, v.full_name.downcase] end write_hash_to_file.call(short_names_to_tokens, "#{dir}/short.yml") end end regexp_parser-1.6.0/lib/0000755000004100000410000000000013541126475015172 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/0000755000004100000410000000000013541126476020041 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/version.rb0000644000004100000410000000007413541126476022054 0ustar www-datawww-dataclass Regexp class Parser VERSION = '1.6.0' end end regexp_parser-1.6.0/lib/regexp_parser/lexer.rb0000644000004100000410000000744413541126475021515 0ustar www-datawww-data# A very thin wrapper around the scanner that breaks quantified literal runs, # collects emitted tokens into an array, calculates their nesting depth, and # normalizes tokens for the parser, and checks if they are implemented by the # given syntax flavor. class Regexp::Lexer OPENING_TOKENS = [ :capture, :passive, :lookahead, :nlookahead, :lookbehind, :nlookbehind, :atomic, :options, :options_switch, :named, :absence ].freeze CLOSING_TOKENS = [:close].freeze def self.lex(input, syntax = "ruby/#{RUBY_VERSION}", &block) new.lex(input, syntax, &block) end def lex(input, syntax = "ruby/#{RUBY_VERSION}", &block) syntax = Regexp::Syntax.new(syntax) self.tokens = [] self.nesting = 0 self.set_nesting = 0 self.conditional_nesting = 0 self.shift = 0 last = nil Regexp::Scanner.scan(input) do |type, token, text, ts, te| type, token = *syntax.normalize(type, token) syntax.check! type, token ascend(type, token) if type == :quantifier and last break_literal(last) if last.type == :literal break_codepoint_list(last) if last.token == :codepoint_list end current = Regexp::Token.new(type, token, text, ts + shift, te + shift, nesting, set_nesting, conditional_nesting) current = merge_condition(current) if type == :conditional and [:condition, :condition_close].include?(token) last.next = current if last current.previous = last if last tokens << current last = current descend(type, token) end if block_given? tokens.map { |t| block.call(t) } else tokens end end class << self alias :scan :lex end private attr_accessor :tokens, :nesting, :set_nesting, :conditional_nesting, :shift def ascend(type, token) case type when :group, :assertion self.nesting = nesting - 1 if CLOSING_TOKENS.include?(token) when :set self.set_nesting = set_nesting - 1 if token == :close when :conditional self.conditional_nesting = conditional_nesting - 1 if token == :close end end def descend(type, token) case type when :group, :assertion self.nesting = nesting + 1 if OPENING_TOKENS.include?(token) when :set self.set_nesting = set_nesting + 1 if token == :open when :conditional self.conditional_nesting = conditional_nesting + 1 if token == :open end end # called by scan to break a literal run that is longer than one character # into two separate tokens when it is followed by a quantifier def break_literal(token) lead, last, _ = token.text.partition(/.\z/mu) return if lead.empty? tokens.pop tokens << Regexp::Token.new(:literal, :literal, lead, token.ts, (token.te - last.bytesize), nesting, set_nesting, conditional_nesting) tokens << Regexp::Token.new(:literal, :literal, last, (token.ts + lead.bytesize), token.te, nesting, set_nesting, conditional_nesting) end def break_codepoint_list(token) lead, _, tail = token.text.rpartition(' ') return if lead.empty? tokens.pop tokens << Regexp::Token.new(:escape, :codepoint_list, lead + '}', token.ts, (token.te - tail.length), nesting, set_nesting, conditional_nesting) tokens << Regexp::Token.new(:escape, :codepoint_list, '\u{' + tail, (token.ts + lead.length + 1), (token.te + 3), nesting, set_nesting, conditional_nesting) self.shift = shift + 3 # one space less, but extra \, u, {, and } end def merge_condition(current) last = tokens.pop Regexp::Token.new(:conditional, :condition, last.text + current.text, last.ts, current.te, nesting, set_nesting, conditional_nesting) end end # module Regexp::Lexer regexp_parser-1.6.0/lib/regexp_parser/expression/0000755000004100000410000000000013541126475022237 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/expression/methods/0000755000004100000410000000000013541126475023702 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/expression/methods/tests.rb0000644000004100000410000000567613541126475025407 0ustar www-datawww-datamodule Regexp::Expression class Base # Test if this expression has the given test_type, which can be either # a symbol or an array of symbols to check against the expression's type. # # # is it a :group expression # exp.type? :group # # # is it a :set, or :meta # exp.type? [:set, :meta] # def type?(test_type) test_types = Array(test_type).map(&:to_sym) test_types.include?(:*) || test_types.include?(type) end # Test if this expression has the given test_token, and optionally a given # test_type. # # # Any expressions # exp.is? :* # always returns true # # # is it a :capture # exp.is? :capture # # # is it a :character and a :set # exp.is? :character, :set # # # is it a :meta :dot # exp.is? :dot, :meta # # # is it a :meta or :escape :dot # exp.is? :dot, [:meta, :escape] # def is?(test_token, test_type = nil) return true if test_token === :* token == test_token and (test_type ? type?(test_type) : true) end # Test if this expression matches an entry in the given scope spec. # # A scope spec can be one of: # # . An array: Interpreted as a set of tokens, tested for inclusion # of the expression's token. # # . A hash: Where the key is interpreted as the expression type # and the value is either a symbol or an array. In this # case, when the scope is a hash, one_of? calls itself to # evaluate the key's value. # # . A symbol: matches the expression's token or type, depending on # the level of the call. If one_of? is called directly with # a symbol then it will always be checked against the # type of the expression. If it's being called for a value # from a hash, it will be checked against the token of the # expression. # # # any expression # exp.one_of?(:*) # always true # # # like exp.type?(:group) # exp.one_of?(:group) # # # any expression of type meta # exp.one_of?(:meta => :*) # # # meta dots and alternations # exp.one_of?(:meta => [:dot, :alternation]) # # # meta dots and any set tokens # exp.one_of?({meta: [:dot], set: :*}) # def one_of?(scope, top = true) case scope when Array scope.include?(:*) || scope.include?(token) when Hash if scope.has_key?(:*) test_type = scope.has_key?(type) ? type : :* one_of?(scope[test_type], false) else scope.has_key?(type) && one_of?(scope[type], false) end when Symbol scope.equal?(:*) || (top ? type?(scope) : is?(scope)) else raise ArgumentError, "Array, Hash, or Symbol expected, #{scope.class.name} given" end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/methods/traverse.rb0000644000004100000410000000351413541126475026065 0ustar www-datawww-datamodule Regexp::Expression class Subexpression < Regexp::Expression::Base # Traverses the subexpression (depth-first, pre-order) and calls the given # block for each expression with three arguments; the traversal event, # the expression, and the index of the expression within its parent. # # The event argument is passed as follows: # # - For subexpressions, :enter upon entering the subexpression, and # :exit upon exiting it. # # - For terminal expressions, :visit is called once. # # Returns self. def traverse(include_self = false, &block) raise 'traverse requires a block' unless block_given? block.call(:enter, self, 0) if include_self each_with_index do |exp, index| if exp.terminal? block.call(:visit, exp, index) else block.call(:enter, exp, index) exp.traverse(&block) block.call(:exit, exp, index) end end block.call(:exit, self, 0) if include_self self end alias :walk :traverse # Iterates over the expressions of this expression as an array, passing # the expression and its index within its parent to the given block. def each_expression(include_self = false, &block) traverse(include_self) do |event, exp, index| yield(exp, index) unless event == :exit end end # Returns a new array with the results of calling the given block once # for every expression. If a block is not given, returns an array with # each expression and its level index as an array. def flat_map(include_self = false, &block) result = [] each_expression(include_self) do |exp, index| if block_given? result << yield(exp, index) else result << [exp, index] end end result end end end regexp_parser-1.6.0/lib/regexp_parser/expression/methods/strfregexp.rb0000644000004100000410000000616313541126475026426 0ustar www-datawww-datamodule Regexp::Expression class Base # %l Level (depth) of the expression. Returns 'root' for the root # expression, returns zero or higher for all others. # # %> Indentation at expression's level. # # %x Index of the expression at its depth. Available when using # the sprintf_tree method only. # # %s Start offset within the whole expression. # %e End offset within the whole expression. # %S Length of expression. # # %o Coded offset and length, same as '@%s+%S' # # %y Type of expression. # %k Token of expression. # %i ID, same as '%y:%k' # %c Class name # # %q Quantifier info, as {m[,M]} # %Q Quantifier text # # %z Quantifier min # %Z Quantifier max # # %t Base text of the expression (excludes quantifier, if any) # %~t Full text if the expression is terminal, otherwise %i # %T Full text of the expression (includes quantifier, if any) # # %b Basic info, same as '%o %i' # %m Most info, same as '%b %q' # %a All info, same as '%m %t' # def strfregexp(format = '%a', indent_offset = 0, index = nil) have_index = index ? true : false part = {} print_level = nesting_level > 0 ? nesting_level - 1 : nil # Order is important! Fields that use other fields in their # definition must appear before the fields they use. part_keys = %w{a m b o i l x s e S y k c q Q z Z t ~t T >} part.keys.each {|k| part[k] = ""} part['>'] = print_level ? (' ' * (print_level + indent_offset)) : '' part['l'] = print_level ? "#{'%d' % print_level}" : 'root' part['x'] = "#{'%d' % index}" if have_index part['s'] = starts_at part['S'] = full_length part['e'] = starts_at + full_length part['o'] = coded_offset part['k'] = token part['y'] = type part['i'] = '%y:%k' part['c'] = self.class.name if quantified? if quantifier.max == -1 part['q'] = "{#{quantifier.min}, or-more}" else part['q'] = "{#{quantifier.min}, #{quantifier.max}}" end part['Q'] = quantifier.text part['z'] = quantifier.min part['Z'] = quantifier.max else part['q'] = '{1}' part['Q'] = '' part['z'] = '1' part['Z'] = '1' end part['t'] = to_s(:base) part['~t'] = terminal? ? to_s : "#{type}:#{token}" part['T'] = to_s(:full) part['b'] = '%o %i' part['m'] = '%b %q' part['a'] = '%m %t' out = format.dup part_keys.each do |k| out.gsub!(/%#{k}/, part[k].to_s) end out end alias :strfre :strfregexp end class Subexpression < Regexp::Expression::Base def strfregexp_tree(format = '%a', include_self = true, separator = "\n") output = include_self ? [self.strfregexp(format)] : [] output += flat_map do |exp, index| exp.strfregexp(format, (include_self ? 1 : 0), index) end output.join(separator) end alias :strfre_tree :strfregexp_tree end end regexp_parser-1.6.0/lib/regexp_parser/expression/methods/options.rb0000644000004100000410000000122313541126475025720 0ustar www-datawww-datamodule Regexp::Expression class Base def multiline? options[:m] == true end alias :m? :multiline? def case_insensitive? options[:i] == true end alias :i? :case_insensitive? alias :ignore_case? :case_insensitive? def free_spacing? options[:x] == true end alias :x? :free_spacing? alias :extended? :free_spacing? def default_classes? options[:d] == true end alias :d? :default_classes? def ascii_classes? options[:a] == true end alias :a? :ascii_classes? def unicode_classes? options[:u] == true end alias :u? :unicode_classes? end end regexp_parser-1.6.0/lib/regexp_parser/expression/methods/match.rb0000644000004100000410000000035713541126475025330 0ustar www-datawww-datamodule Regexp::Expression class Base def match?(string) !!match(string) end alias :matches? :match? def match(string, offset = 0) Regexp.new(to_s).match(string, offset) end alias :=~ :match end end regexp_parser-1.6.0/lib/regexp_parser/expression/methods/match_length.rb0000644000004100000410000001006713541126475026670 0ustar www-datawww-dataclass Regexp::MatchLength include Enumerable def self.of(obj) exp = obj.is_a?(Regexp::Expression::Base) ? obj : Regexp::Parser.parse(obj) exp.match_length end def initialize(exp, opts = {}) self.exp_class = exp.class self.min_rep = exp.repetitions.min self.max_rep = exp.repetitions.max if base = opts[:base] self.base_min = base self.base_max = base self.reify = ->{ '.' * base } else self.base_min = opts.fetch(:base_min) self.base_max = opts.fetch(:base_max) self.reify = opts.fetch(:reify) end end def each(opts = {}) return enum_for(__method__) unless block_given? limit = opts[:limit] || 1000 yielded = 0 (min..max).each do |num| next unless include?(num) yield(num) break if (yielded += 1) >= limit end end def endless_each(&block) return enum_for(__method__) unless block_given? (min..max).each { |num| yield(num) if include?(num) } end def include?(length) test_regexp.match?('X' * length) end def fixed? min == max end def min min_rep * base_min end def max max_rep * base_max end def minmax [min, max] end def inspect type = exp_class.name.sub('Regexp::Expression::', '') "#<#{self.class}<#{type}> min=#{min} max=#{max}>" end def to_re "(?:#{reify.call}){#{min_rep},#{max_rep unless max_rep == Float::INFINITY}}" end private attr_accessor :base_min, :base_max, :min_rep, :max_rep, :exp_class, :reify def test_regexp @test_regexp ||= Regexp.new("^#{to_re}$").tap do |regexp| regexp.respond_to?(:match?) || def regexp.match?(str); !!match(str) end end end end module Regexp::Expression MatchLength = Regexp::MatchLength [ CharacterSet, CharacterSet::Intersection, CharacterSet::IntersectedSequence, CharacterSet::Range, CharacterType::Base, EscapeSequence::Base, PosixClass, UnicodeProperty::Base, ].each do |klass| klass.class_eval <<-RUBY, __FILE__, __LINE__ + 1 def match_length MatchLength.new(self, base: 1) end RUBY end class Literal def match_length MatchLength.new(self, base: text.length) end end class Subexpression def match_length MatchLength.new(self, base_min: map { |exp| exp.match_length.min }.inject(0, :+), base_max: map { |exp| exp.match_length.max }.inject(0, :+), reify: ->{ map { |exp| exp.match_length.to_re }.join }) end def inner_match_length dummy = Regexp::Expression::Root.build dummy.expressions = expressions.map(&:clone) dummy.quantifier = quantifier && quantifier.clone dummy.match_length end end [ Alternation, Conditional::Expression, ].each do |klass| klass.class_eval <<-RUBY, __FILE__, __LINE__ + 1 def match_length MatchLength.new(self, base_min: map { |exp| exp.match_length.min }.min, base_max: map { |exp| exp.match_length.max }.max, reify: ->{ map { |exp| exp.match_length.to_re }.join('|') }) end RUBY end [ Anchor::Base, Assertion::Base, Conditional::Condition, FreeSpace, Keep::Mark, ].each do |klass| klass.class_eval <<-RUBY, __FILE__, __LINE__ + 1 def match_length MatchLength.new(self, base: 0) end RUBY end class Backreference::Base def match_length if referenced_expression.nil? raise ArgumentError, 'Missing referenced_expression - not parsed?' end referenced_expression.unquantified_clone.match_length end end class EscapeSequence::CodepointList def match_length MatchLength.new(self, base: codepoints.count) end end # Special case. Absence group can match 0.. chars, irrespective of content. # TODO: in theory, they *can* exclude match lengths with `.`: `(?~.{3})` class Group::Absence def match_length MatchLength.new(self, base_min: 0, base_max: Float::INFINITY, reify: ->{ '.*' }) end end end regexp_parser-1.6.0/lib/regexp_parser/expression/sequence_operation.rb0000644000004100000410000000104313541126475026452 0ustar www-datawww-datamodule Regexp::Expression # abstract class class SequenceOperation < Regexp::Expression::Subexpression alias :sequences :expressions alias :operands :expressions alias :operator :text def starts_at expressions.first.starts_at end alias :ts :starts_at def <<(exp) expressions.last << exp end def add_sequence(active_opts = {}) self.class::OPERAND.add_to(self, {}, active_opts) end def to_s(format = :full) sequences.map { |e| e.to_s(format) }.join(text) end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/0000755000004100000410000000000013541126475023674 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/expression/classes/conditional.rb0000644000004100000410000000272613541126475026533 0ustar www-datawww-datamodule Regexp::Expression module Conditional class TooManyBranches < StandardError def initialize super('The conditional expression has more than 2 branches') end end class Condition < Regexp::Expression::Base attr_accessor :referenced_expression # Name or number of the referenced capturing group that determines state. # Returns a String if reference is by name, Integer if by number. def reference ref = text.tr("'<>()", "") ref =~ /\D/ ? ref : Integer(ref) end end class Branch < Regexp::Expression::Sequence; end class Expression < Regexp::Expression::Subexpression attr_accessor :referenced_expression def <<(exp) expressions.last << exp end def add_sequence(active_opts = {}) raise TooManyBranches.new if branches.length == 2 params = { conditional_level: conditional_level + 1 } Branch.add_to(self, params, active_opts) end alias :branch :add_sequence def condition=(exp) expressions.delete(condition) expressions.unshift(exp) end def condition find { |subexp| subexp.is_a?(Condition) } end def branches select { |subexp| subexp.is_a?(Sequence) } end def reference condition.reference end def to_s(format = :full) "#{text}#{condition}#{branches.join('|')})#{quantifier_affix(format)}" end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/group.rb0000644000004100000410000000245413541126475025362 0ustar www-datawww-datamodule Regexp::Expression module Group class Base < Regexp::Expression::Subexpression def to_s(format = :full) "#{text}#{expressions.join})#{quantifier_affix(format)}" end def capturing?; false end def comment?; false end end class Atomic < Group::Base; end class Passive < Group::Base; end class Absence < Group::Base; end class Options < Group::Base attr_accessor :option_changes end class Capture < Group::Base attr_accessor :number, :number_at_level alias identifier number def capturing?; true end end class Named < Group::Capture attr_reader :name alias identifier name def initialize(token, options = {}) @name = token.text[3..-2] super end def initialize_clone(orig) @name = orig.name.dup super end end class Comment < Group::Base def to_s(_format = :full) text.dup end def comment?; true end end end module Assertion class Base < Regexp::Expression::Group::Base; end class Lookahead < Assertion::Base; end class NegativeLookahead < Assertion::Base; end class Lookbehind < Assertion::Base; end class NegativeLookbehind < Assertion::Base; end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/alternation.rb0000644000004100000410000000044713541126475026546 0ustar www-datawww-datamodule Regexp::Expression # A sequence of expressions, used by Alternation as one of its alternative. class Alternative < Regexp::Expression::Sequence; end class Alternation < Regexp::Expression::SequenceOperation OPERAND = Alternative alias :alternatives :expressions end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/set.rb0000644000004100000410000000111113541126475025006 0ustar www-datawww-datamodule Regexp::Expression class CharacterSet < Regexp::Expression::Subexpression attr_accessor :closed, :negative alias :negative? :negative alias :negated? :negative alias :closed? :closed def initialize(token, options = {}) self.negative = false self.closed = false super end def negate self.negative = true end def close self.closed = true end def to_s(format = :full) "#{text}#{'^' if negated?}#{expressions.join}]#{quantifier_affix(format)}" end end end # module Regexp::Expression regexp_parser-1.6.0/lib/regexp_parser/expression/classes/property.rb0000644000004100000410000000675213541126475026117 0ustar www-datawww-datamodule Regexp::Expression module UnicodeProperty class Base < Regexp::Expression::Base def negative? type == :nonproperty end def name text =~ /\A\\[pP]\{([^}]+)\}\z/; $1 end def shortcut (Regexp::Scanner.short_prop_map.rassoc(token.to_s) || []).first end end class Alnum < Base; end class Alpha < Base; end class Ascii < Base; end class Blank < Base; end class Cntrl < Base; end class Digit < Base; end class Graph < Base; end class Lower < Base; end class Print < Base; end class Punct < Base; end class Space < Base; end class Upper < Base; end class Word < Base; end class Xdigit < Base; end class XPosixPunct < Base; end class Newline < Base; end class Any < Base; end class Assigned < Base; end module Letter class Base < UnicodeProperty::Base; end class Any < Letter::Base; end class Cased < Letter::Base; end class Uppercase < Letter::Base; end class Lowercase < Letter::Base; end class Titlecase < Letter::Base; end class Modifier < Letter::Base; end class Other < Letter::Base; end end module Mark class Base < UnicodeProperty::Base; end class Any < Mark::Base; end class Combining < Mark::Base; end class Nonspacing < Mark::Base; end class Spacing < Mark::Base; end class Enclosing < Mark::Base; end end module Number class Base < UnicodeProperty::Base; end class Any < Number::Base; end class Decimal < Number::Base; end class Letter < Number::Base; end class Other < Number::Base; end end module Punctuation class Base < UnicodeProperty::Base; end class Any < Punctuation::Base; end class Connector < Punctuation::Base; end class Dash < Punctuation::Base; end class Open < Punctuation::Base; end class Close < Punctuation::Base; end class Initial < Punctuation::Base; end class Final < Punctuation::Base; end class Other < Punctuation::Base; end end module Separator class Base < UnicodeProperty::Base; end class Any < Separator::Base; end class Space < Separator::Base; end class Line < Separator::Base; end class Paragraph < Separator::Base; end end module Symbol class Base < UnicodeProperty::Base; end class Any < Symbol::Base; end class Math < Symbol::Base; end class Currency < Symbol::Base; end class Modifier < Symbol::Base; end class Other < Symbol::Base; end end module Codepoint class Base < UnicodeProperty::Base; end class Any < Codepoint::Base; end class Control < Codepoint::Base; end class Format < Codepoint::Base; end class Surrogate < Codepoint::Base; end class PrivateUse < Codepoint::Base; end class Unassigned < Codepoint::Base; end end class Age < UnicodeProperty::Base; end class Derived < UnicodeProperty::Base; end class Emoji < UnicodeProperty::Base; end class Script < UnicodeProperty::Base; end class Block < UnicodeProperty::Base; end end end # module Regexp::Expression regexp_parser-1.6.0/lib/regexp_parser/expression/classes/escape.rb0000644000004100000410000000434413541126475025466 0ustar www-datawww-datamodule Regexp::Expression module EscapeSequence class Base < Regexp::Expression::Base require 'yaml' def char # poor man's unescape without using eval YAML.load(%Q(---\n"#{text}"\n)) end def codepoint char.ord end end class Literal < EscapeSequence::Base def char text[1..-1] end end class AsciiEscape < EscapeSequence::Base; end class Backspace < EscapeSequence::Base; end class Bell < EscapeSequence::Base; end class FormFeed < EscapeSequence::Base; end class Newline < EscapeSequence::Base; end class Return < EscapeSequence::Base; end class Tab < EscapeSequence::Base; end class VerticalTab < EscapeSequence::Base; end class Hex < EscapeSequence::Base; end class Codepoint < EscapeSequence::Base; end class CodepointList < EscapeSequence::Base def char raise NoMethodError, 'CodepointList responds only to #chars' end def codepoint raise NoMethodError, 'CodepointList responds only to #codepoints' end def chars codepoints.map { |cp| cp.chr('utf-8') } end def codepoints text.scan(/\h+/).map(&:hex) end end class Octal < EscapeSequence::Base def char text[1..-1].to_i(8).chr('utf-8') end end class AbstractMetaControlSequence < EscapeSequence::Base def char codepoint.chr('utf-8') end private def control_sequence_to_s(control_sequence) five_lsb = control_sequence.unpack('B*').first[-5..-1] ["000#{five_lsb}"].pack('B*') end def meta_char_to_codepoint(meta_char) byte_value = meta_char.ord byte_value < 128 ? byte_value + 128 : byte_value end end class Control < AbstractMetaControlSequence def codepoint control_sequence_to_s(text).ord end end class Meta < AbstractMetaControlSequence def codepoint meta_char_to_codepoint(text[-1]) end end class MetaControl < AbstractMetaControlSequence def codepoint meta_char_to_codepoint(control_sequence_to_s(text)) end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/free_space.rb0000644000004100000410000000056413541126475026322 0ustar www-datawww-datamodule Regexp::Expression class FreeSpace < Regexp::Expression::Base def quantify(token, text, min = nil, max = nil, mode = :greedy) raise "Can not quantify a free space object" end end class Comment < Regexp::Expression::FreeSpace; end class WhiteSpace < Regexp::Expression::FreeSpace def merge(exp) text << exp.text end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/root.rb0000644000004100000410000000122613541126475025205 0ustar www-datawww-datamodule Regexp::Expression class Root < Regexp::Expression::Subexpression # TODO: this override is here for backwards compatibility, remove in 2.0.0 def initialize(*args) unless args.first.is_a?(Regexp::Token) warn('WARNING: Root.new without a Token argument is deprecated and '\ 'will be removed in 2.0.0. Use Root.build for the old behavior.') return super(self.class.build_token, *args) end super end class << self def build(options = {}) new(build_token, options) end def build_token Regexp::Token.new(:expression, :root, '', 0) end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/literal.rb0000644000004100000410000000017413541126475025657 0ustar www-datawww-datamodule Regexp::Expression class Literal < Regexp::Expression::Base # Obviously nothing special here, yet. end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/backref.rb0000644000004100000410000000265013541126475025621 0ustar www-datawww-datamodule Regexp::Expression module Backreference class Base < Regexp::Expression::Base attr_accessor :referenced_expression end class Number < Backreference::Base attr_reader :number alias reference number def initialize(token, options = {}) @number = token.text[token.token.equal?(:number) ? 1..-1 : 3..-2].to_i super end end class Name < Backreference::Base attr_reader :name alias reference name def initialize(token, options = {}) @name = token.text[3..-2] super end end class NumberRelative < Backreference::Number attr_accessor :effective_number alias reference effective_number end class NumberCall < Backreference::Number; end class NameCall < Backreference::Name; end class NumberCallRelative < Backreference::NumberRelative; end class NumberRecursionLevel < Backreference::Number attr_reader :recursion_level def initialize(token, options = {}) super @number, @recursion_level = token.text[3..-2].split(/(?=[+-])/).map(&:to_i) end end class NameRecursionLevel < Backreference::Name attr_reader :recursion_level def initialize(token, options = {}) super @name, recursion_level = token.text[3..-2].split(/(?=[+-])/) @recursion_level = recursion_level.to_i end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/type.rb0000644000004100000410000000127713541126475025211 0ustar www-datawww-datamodule Regexp::Expression module CharacterType class Base < Regexp::Expression::Base; end class Any < CharacterType::Base; end class Digit < CharacterType::Base; end class NonDigit < CharacterType::Base; end class Hex < CharacterType::Base; end class NonHex < CharacterType::Base; end class Word < CharacterType::Base; end class NonWord < CharacterType::Base; end class Space < CharacterType::Base; end class NonSpace < CharacterType::Base; end class Linebreak < CharacterType::Base; end class ExtendedGrapheme < CharacterType::Base; end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/keep.rb0000644000004100000410000000014113541126475025141 0ustar www-datawww-datamodule Regexp::Expression module Keep class Mark < Regexp::Expression::Base; end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/posix_class.rb0000644000004100000410000000026013541126475026546 0ustar www-datawww-datamodule Regexp::Expression class PosixClass < Regexp::Expression::Base def negative? type == :nonposixclass end def name token.to_s end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/anchor.rb0000644000004100000410000000135313541126475025475 0ustar www-datawww-datamodule Regexp::Expression module Anchor class Base < Regexp::Expression::Base; end class BeginningOfLine < Anchor::Base; end class EndOfLine < Anchor::Base; end class BeginningOfString < Anchor::Base; end class EndOfString < Anchor::Base; end class EndOfStringOrBeforeEndOfLine < Anchor::Base; end class WordBoundary < Anchor::Base; end class NonWordBoundary < Anchor::Base; end class MatchStart < Anchor::Base; end BOL = BeginningOfLine EOL = EndOfLine BOS = BeginningOfString EOS = EndOfString EOSobEOL = EndOfStringOrBeforeEndOfLine end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/set/0000755000004100000410000000000013541126475024467 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/expression/classes/set/range.rb0000644000004100000410000000075013541126475026112 0ustar www-datawww-datamodule Regexp::Expression class CharacterSet < Regexp::Expression::Subexpression class Range < Regexp::Expression::Subexpression def starts_at expressions.first.starts_at end alias :ts :starts_at def <<(exp) complete? && raise("Can't add more than 2 expressions to a Range") super end def complete? count == 2 end def to_s(_format = :full) expressions.join(text) end end end end regexp_parser-1.6.0/lib/regexp_parser/expression/classes/set/intersection.rb0000644000004100000410000000041313541126475027520 0ustar www-datawww-datamodule Regexp::Expression class CharacterSet < Regexp::Expression::Subexpression class IntersectedSequence < Regexp::Expression::Sequence; end class Intersection < Regexp::Expression::SequenceOperation OPERAND = IntersectedSequence end end end regexp_parser-1.6.0/lib/regexp_parser/expression/subexpression.rb0000644000004100000410000000246213541126475025501 0ustar www-datawww-datamodule Regexp::Expression class Subexpression < Regexp::Expression::Base include Enumerable attr_accessor :expressions def initialize(token, options = {}) super self.expressions = [] end # Override base method to clone the expressions as well. def initialize_clone(orig) self.expressions = orig.expressions.map(&:clone) super end def <<(exp) if exp.is_a?(WhiteSpace) && last && last.is_a?(WhiteSpace) last.merge(exp) else exp.nesting_level = nesting_level + 1 expressions << exp end end %w[[] at each empty? fetch index join last length values_at].each do |method| class_eval <<-RUBY, __FILE__, __LINE__ + 1 def #{method}(*args, &block) expressions.#{method}(*args, &block) end RUBY end def dig(*indices) exp = self indices.each { |idx| exp = exp.nil? || exp.terminal? ? nil : exp[idx] } exp end def te ts + to_s.length end def to_s(format = :full) # Note: the format does not get passed down to subexpressions. "#{expressions.join}#{quantifier_affix(format)}" end def to_h attributes.merge({ text: to_s(:base), expressions: expressions.map(&:to_h) }) end end end regexp_parser-1.6.0/lib/regexp_parser/expression/quantifier.rb0000644000004100000410000000141313541126475024732 0ustar www-datawww-datamodule Regexp::Expression class Quantifier MODES = [:greedy, :possessive, :reluctant] attr_reader :token, :text, :min, :max, :mode def initialize(token, text, min, max, mode) @token = token @text = text @mode = mode @min = min @max = max end def initialize_clone(orig) @text = orig.text.dup super end def to_s text.dup end alias :to_str :to_s def to_h { token: token, text: text, mode: mode, min: min, max: max, } end MODES.each do |mode| class_eval <<-RUBY, __FILE__, __LINE__ + 1 def #{mode}? mode.equal?(:#{mode}) end RUBY end alias :lazy? :reluctant? end end regexp_parser-1.6.0/lib/regexp_parser/expression/sequence.rb0000644000004100000410000000367413541126475024406 0ustar www-datawww-datamodule Regexp::Expression # A sequence of expressions. Differs from a Subexpressions by how it handles # quantifiers, as it applies them to its last element instead of itself as # a whole subexpression. # # Used as the base class for the Alternation alternatives, Conditional # branches, and CharacterSet::Intersection intersected sequences. class Sequence < Regexp::Expression::Subexpression # TODO: this override is here for backwards compatibility, remove in 2.0.0 def initialize(*args) if args.count == 3 warn('WARNING: Sequence.new without a Regexp::Token argument is '\ 'deprecated and will be removed in 2.0.0.') return self.class.at_levels(*args) end super end class << self def add_to(subexpression, params = {}, active_opts = {}) sequence = at_levels( subexpression.level, subexpression.set_level, params[:conditional_level] || subexpression.conditional_level ) sequence.nesting_level = subexpression.nesting_level + 1 sequence.options = active_opts subexpression.expressions << sequence sequence end def at_levels(level, set_level, conditional_level) token = Regexp::Token.new( :expression, :sequence, '', nil, # ts nil, # te level, set_level, conditional_level ) new(token) end end def starts_at expressions.first.starts_at end alias :ts :starts_at def quantify(token, text, min = nil, max = nil, mode = :greedy) offset = -1 target = expressions[offset] while target.is_a?(FreeSpace) target = expressions[offset -= 1] end target || raise(ArgumentError, "No valid target found for '#{text}' "\ 'quantifier') target.quantify(token, text, min, max, mode) end end end regexp_parser-1.6.0/lib/regexp_parser/scanner/0000755000004100000410000000000013541126475021471 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/scanner/char_type.rl0000644000004100000410000000176413541126475024016 0ustar www-datawww-data%%{ machine re_char_type; single_codepoint_char_type = [dDhHsSwW]; multi_codepoint_char_type = [RX]; char_type_char = single_codepoint_char_type | multi_codepoint_char_type; # Char types scanner # -------------------------------------------------------------------------- char_type := |* char_type_char { case text = text(data, ts, te, 1).first when '\d'; emit(:type, :digit, text, ts - 1, te) when '\D'; emit(:type, :nondigit, text, ts - 1, te) when '\h'; emit(:type, :hex, text, ts - 1, te) when '\H'; emit(:type, :nonhex, text, ts - 1, te) when '\s'; emit(:type, :space, text, ts - 1, te) when '\S'; emit(:type, :nonspace, text, ts - 1, te) when '\w'; emit(:type, :word, text, ts - 1, te) when '\W'; emit(:type, :nonword, text, ts - 1, te) when '\R'; emit(:type, :linebreak, text, ts - 1, te) when '\X'; emit(:type, :xgrapheme, text, ts - 1, te) end fret; }; *|; }%% regexp_parser-1.6.0/lib/regexp_parser/scanner/properties/0000755000004100000410000000000013541126475023665 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/scanner/properties/short.yml0000644000004100000410000001005513541126475025550 0ustar www-datawww-data# # THIS FILE IS AUTO-GENERATED BY `rake props:update`, DO NOT EDIT # --- adlm: adlam aghb: caucasian_albanian ahex: ascii_hex_digit arab: arabic armi: imperial_aramaic armn: armenian avst: avestan bali: balinese bamu: bamum bass: bassa_vah batk: batak beng: bengali bhks: bhaiksuki bidic: bidi_control bopo: bopomofo brah: brahmi brai: braille bugi: buginese buhd: buhid c: other cakm: chakma cans: canadian_aboriginal cari: carian cc: control cf: format cher: cherokee ci: case_ignorable cn: unassigned co: private_use combiningmark: mark copt: coptic cprt: cypriot cs: surrogate cwcf: changes_when_casefolded cwcm: changes_when_casemapped cwl: changes_when_lowercased cwt: changes_when_titlecased cwu: changes_when_uppercased cyrl: cyrillic dep: deprecated deva: devanagari di: default_ignorable_code_point dia: diacritic dogr: dogra dsrt: deseret dupl: duployan egyp: egyptian_hieroglyphs elba: elbasan elym: elymaic ethi: ethiopic ext: extender geor: georgian glag: glagolitic gong: gunjala_gondi gonm: masaram_gondi goth: gothic gran: grantha grbase: grapheme_base grek: greek grext: grapheme_extend grlink: grapheme_link gujr: gujarati guru: gurmukhi hang: hangul hani: han hano: hanunoo hatr: hatran hebr: hebrew hex: hex_digit hira: hiragana hluw: anatolian_hieroglyphs hmng: pahawh_hmong hmnp: nyiakeng_puachue_hmong hung: old_hungarian idc: id_continue ideo: ideographic ids: id_start idsb: ids_binary_operator idst: ids_trinary_operator ital: old_italic java: javanese joinc: join_control kali: kayah_li kana: katakana khar: kharoshthi khmr: khmer khoj: khojki knda: kannada kthi: kaithi l: letter lana: tai_tham laoo: lao latn: latin lc: cased_letter lepc: lepcha limb: limbu lina: linear_a linb: linear_b ll: lowercase_letter lm: modifier_letter lo: other_letter loe: logical_order_exception lt: titlecase_letter lu: uppercase_letter lyci: lycian lydi: lydian m: mark mahj: mahajani maka: makasar mand: mandaic mani: manichaean marc: marchen mc: spacing_mark me: enclosing_mark medf: medefaidrin mend: mende_kikakui merc: meroitic_cursive mero: meroitic_hieroglyphs mlym: malayalam mn: nonspacing_mark mong: mongolian mroo: mro mtei: meetei_mayek mult: multani mymr: myanmar n: number nand: nandinagari narb: old_north_arabian nbat: nabataean nchar: noncharacter_code_point nd: decimal_number nkoo: nko nl: letter_number 'no': other_number nshu: nushu oalpha: other_alphabetic odi: other_default_ignorable_code_point ogam: ogham ogrext: other_grapheme_extend oidc: other_id_continue oids: other_id_start olck: ol_chiki olower: other_lowercase omath: other_math orkh: old_turkic orya: oriya osge: osage osma: osmanya oupper: other_uppercase p: punctuation palm: palmyrene patsyn: pattern_syntax patws: pattern_white_space pauc: pau_cin_hau pc: connector_punctuation pcm: prepended_concatenation_mark pd: dash_punctuation pe: close_punctuation perm: old_permic pf: final_punctuation phag: phags_pa phli: inscriptional_pahlavi phlp: psalter_pahlavi phnx: phoenician pi: initial_punctuation plrd: miao po: other_punctuation prti: inscriptional_parthian ps: open_punctuation qaac: coptic qaai: inherited qmark: quotation_mark ri: regional_indicator rjng: rejang rohg: hanifi_rohingya runr: runic s: symbol samr: samaritan sarb: old_south_arabian saur: saurashtra sc: currency_symbol sd: soft_dotted sgnw: signwriting shaw: shavian shrd: sharada sidd: siddham sind: khudawadi sinh: sinhala sk: modifier_symbol sm: math_symbol so: other_symbol sogd: sogdian sogo: old_sogdian sora: sora_sompeng soyo: soyombo sterm: sentence_terminal sund: sundanese sylo: syloti_nagri syrc: syriac tagb: tagbanwa takr: takri tale: tai_le talu: new_tai_lue taml: tamil tang: tangut tavt: tai_viet telu: telugu term: terminal_punctuation tfng: tifinagh tglg: tagalog thaa: thaana tibt: tibetan tirh: tirhuta ugar: ugaritic uideo: unified_ideograph vaii: vai vs: variation_selector wara: warang_citi wcho: wancho wspace: white_space xidc: xid_continue xids: xid_start xpeo: old_persian xsux: cuneiform yiii: yi z: separator zanb: zanabazar_square zinh: inherited zl: line_separator zp: paragraph_separator zs: space_separator zyyy: common zzzz: unknown regexp_parser-1.6.0/lib/regexp_parser/scanner/properties/long.yml0000644000004100000410000004314413541126475025355 0ustar www-datawww-data# # THIS FILE IS AUTO-GENERATED BY `rake props:update`, DO NOT EDIT # --- adlam: adlam age=1.1: age=1.1 age=10.0: age=10.0 age=11.0: age=11.0 age=12.0: age=12.0 age=12.1: age=12.1 age=2.0: age=2.0 age=2.1: age=2.1 age=3.0: age=3.0 age=3.1: age=3.1 age=3.2: age=3.2 age=4.0: age=4.0 age=4.1: age=4.1 age=5.0: age=5.0 age=5.1: age=5.1 age=5.2: age=5.2 age=6.0: age=6.0 age=6.1: age=6.1 age=6.2: age=6.2 age=6.3: age=6.3 age=7.0: age=7.0 age=8.0: age=8.0 age=9.0: age=9.0 ahom: ahom alnum: alnum alpha: alpha alphabetic: alphabetic anatolianhieroglyphs: anatolian_hieroglyphs any: any arabic: arabic armenian: armenian ascii: ascii asciihexdigit: ascii_hex_digit assigned: assigned avestan: avestan balinese: balinese bamum: bamum bassavah: bassa_vah batak: batak bengali: bengali bhaiksuki: bhaiksuki bidicontrol: bidi_control blank: blank bopomofo: bopomofo brahmi: brahmi braille: braille buginese: buginese buhid: buhid canadianaboriginal: canadian_aboriginal carian: carian cased: cased casedletter: cased_letter caseignorable: case_ignorable caucasianalbanian: caucasian_albanian chakma: chakma cham: cham changeswhencasefolded: changes_when_casefolded changeswhencasemapped: changes_when_casemapped changeswhenlowercased: changes_when_lowercased changeswhentitlecased: changes_when_titlecased changeswhenuppercased: changes_when_uppercased cherokee: cherokee closepunctuation: close_punctuation cntrl: cntrl common: common connectorpunctuation: connector_punctuation control: control coptic: coptic cuneiform: cuneiform currencysymbol: currency_symbol cypriot: cypriot cyrillic: cyrillic dash: dash dashpunctuation: dash_punctuation decimalnumber: decimal_number defaultignorablecodepoint: default_ignorable_code_point deprecated: deprecated deseret: deseret devanagari: devanagari diacritic: diacritic digit: digit dogra: dogra duployan: duployan egyptianhieroglyphs: egyptian_hieroglyphs elbasan: elbasan elymaic: elymaic emoji: emoji emojicomponent: emoji_component emojimodifier: emoji_modifier emojimodifierbase: emoji_modifier_base emojipresentation: emoji_presentation enclosingmark: enclosing_mark ethiopic: ethiopic extender: extender finalpunctuation: final_punctuation format: format georgian: georgian glagolitic: glagolitic gothic: gothic grantha: grantha graph: graph graphemebase: grapheme_base graphemeextend: grapheme_extend graphemelink: grapheme_link greek: greek gujarati: gujarati gunjalagondi: gunjala_gondi gurmukhi: gurmukhi han: han hangul: hangul hanifirohingya: hanifi_rohingya hanunoo: hanunoo hatran: hatran hebrew: hebrew hexdigit: hex_digit hiragana: hiragana hyphen: hyphen idcontinue: id_continue ideographic: ideographic idsbinaryoperator: ids_binary_operator idstart: id_start idstrinaryoperator: ids_trinary_operator imperialaramaic: imperial_aramaic inadlam: in_adlam inaegeannumbers: in_aegean_numbers inahom: in_ahom inalchemicalsymbols: in_alchemical_symbols inalphabeticpresentationforms: in_alphabetic_presentation_forms inanatolianhieroglyphs: in_anatolian_hieroglyphs inancientgreekmusicalnotation: in_ancient_greek_musical_notation inancientgreeknumbers: in_ancient_greek_numbers inancientsymbols: in_ancient_symbols inarabic: in_arabic inarabicextendeda: in_arabic_extended_a inarabicmathematicalalphabeticsymbols: in_arabic_mathematical_alphabetic_symbols inarabicpresentationformsa: in_arabic_presentation_forms_a inarabicpresentationformsb: in_arabic_presentation_forms_b inarabicsupplement: in_arabic_supplement inarmenian: in_armenian inarrows: in_arrows inavestan: in_avestan inbalinese: in_balinese inbamum: in_bamum inbamumsupplement: in_bamum_supplement inbasiclatin: in_basic_latin inbassavah: in_bassa_vah inbatak: in_batak inbengali: in_bengali inbhaiksuki: in_bhaiksuki inblockelements: in_block_elements inbopomofo: in_bopomofo inbopomofoextended: in_bopomofo_extended inboxdrawing: in_box_drawing inbrahmi: in_brahmi inbraillepatterns: in_braille_patterns inbuginese: in_buginese inbuhid: in_buhid inbyzantinemusicalsymbols: in_byzantine_musical_symbols incarian: in_carian incaucasianalbanian: in_caucasian_albanian inchakma: in_chakma incham: in_cham incherokee: in_cherokee incherokeesupplement: in_cherokee_supplement inchesssymbols: in_chess_symbols incjkcompatibility: in_cjk_compatibility incjkcompatibilityforms: in_cjk_compatibility_forms incjkcompatibilityideographs: in_cjk_compatibility_ideographs incjkcompatibilityideographssupplement: in_cjk_compatibility_ideographs_supplement incjkradicalssupplement: in_cjk_radicals_supplement incjkstrokes: in_cjk_strokes incjksymbolsandpunctuation: in_cjk_symbols_and_punctuation incjkunifiedideographs: in_cjk_unified_ideographs incjkunifiedideographsextensiona: in_cjk_unified_ideographs_extension_a incjkunifiedideographsextensionb: in_cjk_unified_ideographs_extension_b incjkunifiedideographsextensionc: in_cjk_unified_ideographs_extension_c incjkunifiedideographsextensiond: in_cjk_unified_ideographs_extension_d incjkunifiedideographsextensione: in_cjk_unified_ideographs_extension_e incjkunifiedideographsextensionf: in_cjk_unified_ideographs_extension_f incombiningdiacriticalmarks: in_combining_diacritical_marks incombiningdiacriticalmarksextended: in_combining_diacritical_marks_extended incombiningdiacriticalmarksforsymbols: in_combining_diacritical_marks_for_symbols incombiningdiacriticalmarkssupplement: in_combining_diacritical_marks_supplement incombininghalfmarks: in_combining_half_marks incommonindicnumberforms: in_common_indic_number_forms incontrolpictures: in_control_pictures incoptic: in_coptic incopticepactnumbers: in_coptic_epact_numbers incountingrodnumerals: in_counting_rod_numerals incuneiform: in_cuneiform incuneiformnumbersandpunctuation: in_cuneiform_numbers_and_punctuation incurrencysymbols: in_currency_symbols incypriotsyllabary: in_cypriot_syllabary incyrillic: in_cyrillic incyrillicextendeda: in_cyrillic_extended_a incyrillicextendedb: in_cyrillic_extended_b incyrillicextendedc: in_cyrillic_extended_c incyrillicsupplement: in_cyrillic_supplement indeseret: in_deseret indevanagari: in_devanagari indevanagariextended: in_devanagari_extended indingbats: in_dingbats indogra: in_dogra indominotiles: in_domino_tiles induployan: in_duployan inearlydynasticcuneiform: in_early_dynastic_cuneiform inegyptianhieroglyphformatcontrols: in_egyptian_hieroglyph_format_controls inegyptianhieroglyphs: in_egyptian_hieroglyphs inelbasan: in_elbasan inelymaic: in_elymaic inemoticons: in_emoticons inenclosedalphanumerics: in_enclosed_alphanumerics inenclosedalphanumericsupplement: in_enclosed_alphanumeric_supplement inenclosedcjklettersandmonths: in_enclosed_cjk_letters_and_months inenclosedideographicsupplement: in_enclosed_ideographic_supplement inethiopic: in_ethiopic inethiopicextended: in_ethiopic_extended inethiopicextendeda: in_ethiopic_extended_a inethiopicsupplement: in_ethiopic_supplement ingeneralpunctuation: in_general_punctuation ingeometricshapes: in_geometric_shapes ingeometricshapesextended: in_geometric_shapes_extended ingeorgian: in_georgian ingeorgianextended: in_georgian_extended ingeorgiansupplement: in_georgian_supplement inglagolitic: in_glagolitic inglagoliticsupplement: in_glagolitic_supplement ingothic: in_gothic ingrantha: in_grantha ingreekandcoptic: in_greek_and_coptic ingreekextended: in_greek_extended ingujarati: in_gujarati ingunjalagondi: in_gunjala_gondi ingurmukhi: in_gurmukhi inhalfwidthandfullwidthforms: in_halfwidth_and_fullwidth_forms inhangulcompatibilityjamo: in_hangul_compatibility_jamo inhanguljamo: in_hangul_jamo inhanguljamoextendeda: in_hangul_jamo_extended_a inhanguljamoextendedb: in_hangul_jamo_extended_b inhangulsyllables: in_hangul_syllables inhanifirohingya: in_hanifi_rohingya inhanunoo: in_hanunoo inhatran: in_hatran inhebrew: in_hebrew inherited: inherited inhighprivateusesurrogates: in_high_private_use_surrogates inhighsurrogates: in_high_surrogates inhiragana: in_hiragana inideographicdescriptioncharacters: in_ideographic_description_characters inideographicsymbolsandpunctuation: in_ideographic_symbols_and_punctuation inimperialaramaic: in_imperial_aramaic inindicsiyaqnumbers: in_indic_siyaq_numbers ininscriptionalpahlavi: in_inscriptional_pahlavi ininscriptionalparthian: in_inscriptional_parthian inipaextensions: in_ipa_extensions initialpunctuation: initial_punctuation injavanese: in_javanese inkaithi: in_kaithi inkanaextendeda: in_kana_extended_a inkanasupplement: in_kana_supplement inkanbun: in_kanbun inkangxiradicals: in_kangxi_radicals inkannada: in_kannada inkatakana: in_katakana inkatakanaphoneticextensions: in_katakana_phonetic_extensions inkayahli: in_kayah_li inkharoshthi: in_kharoshthi inkhmer: in_khmer inkhmersymbols: in_khmer_symbols inkhojki: in_khojki inkhudawadi: in_khudawadi inlao: in_lao inlatin1supplement: in_latin_1_supplement inlatinextendeda: in_latin_extended_a inlatinextendedadditional: in_latin_extended_additional inlatinextendedb: in_latin_extended_b inlatinextendedc: in_latin_extended_c inlatinextendedd: in_latin_extended_d inlatinextendede: in_latin_extended_e inlepcha: in_lepcha inletterlikesymbols: in_letterlike_symbols inlimbu: in_limbu inlineara: in_linear_a inlinearbideograms: in_linear_b_ideograms inlinearbsyllabary: in_linear_b_syllabary inlisu: in_lisu inlowsurrogates: in_low_surrogates inlycian: in_lycian inlydian: in_lydian inmahajani: in_mahajani inmahjongtiles: in_mahjong_tiles inmakasar: in_makasar inmalayalam: in_malayalam inmandaic: in_mandaic inmanichaean: in_manichaean inmarchen: in_marchen inmasaramgondi: in_masaram_gondi inmathematicalalphanumericsymbols: in_mathematical_alphanumeric_symbols inmathematicaloperators: in_mathematical_operators inmayannumerals: in_mayan_numerals inmedefaidrin: in_medefaidrin inmeeteimayek: in_meetei_mayek inmeeteimayekextensions: in_meetei_mayek_extensions inmendekikakui: in_mende_kikakui inmeroiticcursive: in_meroitic_cursive inmeroitichieroglyphs: in_meroitic_hieroglyphs inmiao: in_miao inmiscellaneousmathematicalsymbolsa: in_miscellaneous_mathematical_symbols_a inmiscellaneousmathematicalsymbolsb: in_miscellaneous_mathematical_symbols_b inmiscellaneoussymbols: in_miscellaneous_symbols inmiscellaneoussymbolsandarrows: in_miscellaneous_symbols_and_arrows inmiscellaneoussymbolsandpictographs: in_miscellaneous_symbols_and_pictographs inmiscellaneoustechnical: in_miscellaneous_technical inmodi: in_modi inmodifiertoneletters: in_modifier_tone_letters inmongolian: in_mongolian inmongoliansupplement: in_mongolian_supplement inmro: in_mro inmultani: in_multani inmusicalsymbols: in_musical_symbols inmyanmar: in_myanmar inmyanmarextendeda: in_myanmar_extended_a inmyanmarextendedb: in_myanmar_extended_b innabataean: in_nabataean innandinagari: in_nandinagari innewa: in_newa innewtailue: in_new_tai_lue innko: in_nko innoblock: in_no_block innumberforms: in_number_forms innushu: in_nushu innyiakengpuachuehmong: in_nyiakeng_puachue_hmong inogham: in_ogham inolchiki: in_ol_chiki inoldhungarian: in_old_hungarian inolditalic: in_old_italic inoldnortharabian: in_old_north_arabian inoldpermic: in_old_permic inoldpersian: in_old_persian inoldsogdian: in_old_sogdian inoldsoutharabian: in_old_south_arabian inoldturkic: in_old_turkic inopticalcharacterrecognition: in_optical_character_recognition inoriya: in_oriya inornamentaldingbats: in_ornamental_dingbats inosage: in_osage inosmanya: in_osmanya inottomansiyaqnumbers: in_ottoman_siyaq_numbers inpahawhhmong: in_pahawh_hmong inpalmyrene: in_palmyrene inpaucinhau: in_pau_cin_hau inphagspa: in_phags_pa inphaistosdisc: in_phaistos_disc inphoenician: in_phoenician inphoneticextensions: in_phonetic_extensions inphoneticextensionssupplement: in_phonetic_extensions_supplement inplayingcards: in_playing_cards inprivateusearea: in_private_use_area inpsalterpahlavi: in_psalter_pahlavi inrejang: in_rejang inruminumeralsymbols: in_rumi_numeral_symbols inrunic: in_runic insamaritan: in_samaritan insaurashtra: in_saurashtra inscriptionalpahlavi: inscriptional_pahlavi inscriptionalparthian: inscriptional_parthian insharada: in_sharada inshavian: in_shavian inshorthandformatcontrols: in_shorthand_format_controls insiddham: in_siddham insinhala: in_sinhala insinhalaarchaicnumbers: in_sinhala_archaic_numbers insmallformvariants: in_small_form_variants insmallkanaextension: in_small_kana_extension insogdian: in_sogdian insorasompeng: in_sora_sompeng insoyombo: in_soyombo inspacingmodifierletters: in_spacing_modifier_letters inspecials: in_specials insundanese: in_sundanese insundanesesupplement: in_sundanese_supplement insuperscriptsandsubscripts: in_superscripts_and_subscripts insupplementalarrowsa: in_supplemental_arrows_a insupplementalarrowsb: in_supplemental_arrows_b insupplementalarrowsc: in_supplemental_arrows_c insupplementalmathematicaloperators: in_supplemental_mathematical_operators insupplementalpunctuation: in_supplemental_punctuation insupplementalsymbolsandpictographs: in_supplemental_symbols_and_pictographs insupplementaryprivateuseareaa: in_supplementary_private_use_area_a insupplementaryprivateuseareab: in_supplementary_private_use_area_b insuttonsignwriting: in_sutton_signwriting insylotinagri: in_syloti_nagri insymbolsandpictographsextendeda: in_symbols_and_pictographs_extended_a insyriac: in_syriac insyriacsupplement: in_syriac_supplement intagalog: in_tagalog intagbanwa: in_tagbanwa intags: in_tags intaile: in_tai_le intaitham: in_tai_tham intaiviet: in_tai_viet intaixuanjingsymbols: in_tai_xuan_jing_symbols intakri: in_takri intamil: in_tamil intamilsupplement: in_tamil_supplement intangut: in_tangut intangutcomponents: in_tangut_components intelugu: in_telugu inthaana: in_thaana inthai: in_thai intibetan: in_tibetan intifinagh: in_tifinagh intirhuta: in_tirhuta intransportandmapsymbols: in_transport_and_map_symbols inugaritic: in_ugaritic inunifiedcanadianaboriginalsyllabics: in_unified_canadian_aboriginal_syllabics inunifiedcanadianaboriginalsyllabicsextended: in_unified_canadian_aboriginal_syllabics_extended invai: in_vai invariationselectors: in_variation_selectors invariationselectorssupplement: in_variation_selectors_supplement invedicextensions: in_vedic_extensions inverticalforms: in_vertical_forms inwancho: in_wancho inwarangciti: in_warang_citi inyijinghexagramsymbols: in_yijing_hexagram_symbols inyiradicals: in_yi_radicals inyisyllables: in_yi_syllables inzanabazarsquare: in_zanabazar_square javanese: javanese joincontrol: join_control kaithi: kaithi kannada: kannada katakana: katakana kayahli: kayah_li kharoshthi: kharoshthi khmer: khmer khojki: khojki khudawadi: khudawadi lao: lao latin: latin lepcha: lepcha letter: letter letternumber: letter_number limbu: limbu lineara: linear_a linearb: linear_b lineseparator: line_separator lisu: lisu logicalorderexception: logical_order_exception lower: lower lowercase: lowercase lowercaseletter: lowercase_letter lycian: lycian lydian: lydian mahajani: mahajani makasar: makasar malayalam: malayalam mandaic: mandaic manichaean: manichaean marchen: marchen mark: mark masaramgondi: masaram_gondi math: math mathsymbol: math_symbol medefaidrin: medefaidrin meeteimayek: meetei_mayek mendekikakui: mende_kikakui meroiticcursive: meroitic_cursive meroitichieroglyphs: meroitic_hieroglyphs miao: miao modi: modi modifierletter: modifier_letter modifiersymbol: modifier_symbol mongolian: mongolian mro: mro multani: multani myanmar: myanmar nabataean: nabataean nandinagari: nandinagari newa: newa newline: newline newtailue: new_tai_lue nko: nko noncharactercodepoint: noncharacter_code_point nonspacingmark: nonspacing_mark number: number nushu: nushu nyiakengpuachuehmong: nyiakeng_puachue_hmong ogham: ogham olchiki: ol_chiki oldhungarian: old_hungarian olditalic: old_italic oldnortharabian: old_north_arabian oldpermic: old_permic oldpersian: old_persian oldsogdian: old_sogdian oldsoutharabian: old_south_arabian oldturkic: old_turkic openpunctuation: open_punctuation oriya: oriya osage: osage osmanya: osmanya other: other otheralphabetic: other_alphabetic otherdefaultignorablecodepoint: other_default_ignorable_code_point othergraphemeextend: other_grapheme_extend otheridcontinue: other_id_continue otheridstart: other_id_start otherletter: other_letter otherlowercase: other_lowercase othermath: other_math othernumber: other_number otherpunctuation: other_punctuation othersymbol: other_symbol otheruppercase: other_uppercase pahawhhmong: pahawh_hmong palmyrene: palmyrene paragraphseparator: paragraph_separator patternsyntax: pattern_syntax patternwhitespace: pattern_white_space paucinhau: pau_cin_hau phagspa: phags_pa phoenician: phoenician prependedconcatenationmark: prepended_concatenation_mark print: print privateuse: private_use psalterpahlavi: psalter_pahlavi punct: punct punctuation: punctuation quotationmark: quotation_mark radical: radical regionalindicator: regional_indicator rejang: rejang runic: runic samaritan: samaritan saurashtra: saurashtra sentenceterminal: sentence_terminal separator: separator sharada: sharada shavian: shavian siddham: siddham signwriting: signwriting sinhala: sinhala softdotted: soft_dotted sogdian: sogdian sorasompeng: sora_sompeng soyombo: soyombo space: space spaceseparator: space_separator spacingmark: spacing_mark sundanese: sundanese surrogate: surrogate sylotinagri: syloti_nagri symbol: symbol syriac: syriac tagalog: tagalog tagbanwa: tagbanwa taile: tai_le taitham: tai_tham taiviet: tai_viet takri: takri tamil: tamil tangut: tangut telugu: telugu terminalpunctuation: terminal_punctuation thaana: thaana thai: thai tibetan: tibetan tifinagh: tifinagh tirhuta: tirhuta titlecaseletter: titlecase_letter ugaritic: ugaritic unassigned: unassigned unifiedideograph: unified_ideograph unknown: unknown upper: upper uppercase: uppercase uppercaseletter: uppercase_letter vai: vai variationselector: variation_selector wancho: wancho warangciti: warang_citi whitespace: white_space word: word xdigit: xdigit xidcontinue: xid_continue xidstart: xid_start xposixpunct: xposixpunct yi: yi zanabazarsquare: zanabazar_square regexp_parser-1.6.0/lib/regexp_parser/scanner/scanner.rl0000644000004100000410000006663213541126475023476 0ustar www-datawww-data%%{ machine re_scanner; include re_char_type "char_type.rl"; include re_property "property.rl"; dot = '.'; backslash = '\\'; alternation = '|'; beginning_of_line = '^'; end_of_line = '$'; range_open = '{'; range_close = '}'; curlies = range_open | range_close; group_open = '('; group_close = ')'; parantheses = group_open | group_close; set_open = '['; set_close = ']'; brackets = set_open | set_close; comment = ('#' . [^\n]* . '\n'); class_name_posix = 'alnum' | 'alpha' | 'blank' | 'cntrl' | 'digit' | 'graph' | 'lower' | 'print' | 'punct' | 'space' | 'upper' | 'xdigit' | 'word' | 'ascii'; class_posix = ('[:' . '^'? . class_name_posix . ':]'); # these are not supported in ruby, and need verification collating_sequence = '[.' . (alpha | [\-])+ . '.]'; character_equivalent = '[=' . alpha . '=]'; line_anchor = beginning_of_line | end_of_line; anchor_char = [AbBzZG]; escaped_ascii = [abefnrtv]; octal_sequence = [0-7]{1,3}; hex_sequence = 'x' . xdigit{1,2}; hex_sequence_err = 'x' . [^0-9a-fA-F{]; codepoint_single = 'u' . xdigit{4}; codepoint_list = 'u{' . xdigit{1,6} . (space . xdigit{1,6})* . '}'; codepoint_sequence = codepoint_single | codepoint_list; control_sequence = ('c' | 'C-') . (backslash . 'M-')? . backslash? . any; meta_sequence = 'M-' . (backslash . ('c' | 'C-'))? . backslash? . any; zero_or_one = '?' | '??' | '?+'; zero_or_more = '*' | '*?' | '*+'; one_or_more = '+' | '+?' | '++'; quantifier_greedy = '?' | '*' | '+'; quantifier_reluctant = '??' | '*?' | '+?'; quantifier_possessive = '?+' | '*+' | '++'; quantifier_mode = '?' | '+'; quantifier_interval = range_open . (digit+)? . ','? . (digit+)? . range_close . quantifier_mode?; quantifiers = quantifier_greedy | quantifier_reluctant | quantifier_possessive | quantifier_interval; conditional = '(?('; group_comment = '?#' . [^)]* . group_close; group_atomic = '?>'; group_passive = '?:'; group_absence = '?~'; assertion_lookahead = '?='; assertion_nlookahead = '?!'; assertion_lookbehind = '?<='; assertion_nlookbehind = '?~]+ . ':'? ) ?; group_ref = [gk]; group_name_char = (alnum | '_'); group_name_id = (group_name_char . (group_name_char+)?)?; group_number = '-'? . [1-9] . ([0-9]+)?; group_level = [+\-] . [0-9]+; group_name = ('<' . group_name_id . '>') | ("'" . group_name_id . "'"); group_lookup = group_name | group_number; group_named = ('?' . group_name ); group_name_ref = group_ref . (('<' . group_name_id . group_level? '>') | ("'" . group_name_id . group_level? "'")); group_number_ref = group_ref . (('<' . group_number . group_level? '>') | ("'" . group_number . group_level? "'")); group_type = group_atomic | group_passive | group_absence | group_named; keep_mark = 'K'; assertion_type = assertion_lookahead | assertion_nlookahead | assertion_lookbehind | assertion_nlookbehind; # characters that 'break' a literal meta_char = dot | backslash | alternation | curlies | parantheses | brackets | line_anchor | quantifier_greedy; ascii_print = ((0x20..0x7e) - meta_char); ascii_nonprint = (0x01..0x1f | 0x7f); utf8_2_byte = (0xc2..0xdf 0x80..0xbf); utf8_3_byte = (0xe0..0xef 0x80..0xbf 0x80..0xbf); utf8_4_byte = (0xf0..0xf4 0x80..0xbf 0x80..0xbf 0x80..0xbf); non_literal_escape = char_type_char | anchor_char | escaped_ascii | group_ref | keep_mark | [xucCM]; non_set_escape = (anchor_char - 'b') | group_ref | keep_mark | multi_codepoint_char_type | [0-9cCM]; # EOF error, used where it can be detected action premature_end_error { text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) } # Invalid sequence error, used from sequences, like escapes and sets action invalid_sequence_error { text = ts ? copy(data, ts-1..-1) : data.pack('c*') validation_error(:sequence, 'sequence', text) } # group (nesting) and set open/close actions action group_opened { self.group_depth = group_depth + 1 } action group_closed { self.group_depth = group_depth - 1 } action set_opened { self.set_depth = set_depth + 1 } action set_closed { self.set_depth = set_depth - 1 } # Character set scanner, continues consuming characters until it meets the # closing bracket of the set. # -------------------------------------------------------------------------- character_set := |* set_close > (set_meta, 2) @set_closed { emit(:set, :close, *text(data, ts, te)) if in_set? fret; else fgoto main; end }; '-]' @set_closed { # special case, emits two tokens emit(:literal, :literal, copy(data, ts..te-2), ts, te - 1) emit(:set, :close, copy(data, ts+1..te-1), ts + 1, te) if in_set? fret; else fgoto main; end }; '-&&' { # special case, emits two tokens emit(:literal, :literal, '-', ts, te) emit(:set, :intersection, '&&', ts, te) }; '^' { text = text(data, ts, te).first if tokens.last[1] == :open emit(:set, :negate, text, ts, te) else emit(:literal, :literal, text, ts, te) end }; '-' { text = text(data, ts, te).first # ranges cant start with a subset or intersection/negation/range operator if tokens.last[0] == :set emit(:literal, :literal, text, ts, te) else emit(:set, :range, text, ts, te) end }; # Unlike ranges, intersections can start or end at set boundaries, whereupon # they match nothing: r = /[a&&]/; [r =~ ?a, r =~ ?&] # => [nil, nil] '&&' { emit(:set, :intersection, *text(data, ts, te)) }; backslash { fcall set_escape_sequence; }; set_open >(open_bracket, 1) >set_opened { emit(:set, :open, *text(data, ts, te)) fcall character_set; }; class_posix >(open_bracket, 1) @set_closed @eof(premature_end_error) { text = text(data, ts, te).first type = :posixclass class_name = text[2..-3] if class_name[0].chr == '^' class_name = class_name[1..-1] type = :nonposixclass end emit(type, class_name.to_sym, text, ts, te) }; collating_sequence >(open_bracket, 1) @set_closed @eof(premature_end_error) { emit(:set, :collation, *text(data, ts, te)) }; character_equivalent >(open_bracket, 1) @set_closed @eof(premature_end_error) { emit(:set, :equivalent, *text(data, ts, te)) }; meta_char > (set_meta, 1) { emit(:literal, :literal, *text(data, ts, te)) }; any | ascii_nonprint | utf8_2_byte | utf8_3_byte | utf8_4_byte { char, *rest = *text(data, ts, te) char.force_encoding('utf-8') if char.respond_to?(:force_encoding) emit(:literal, :literal, char, *rest) }; *|; # set escapes scanner # -------------------------------------------------------------------------- set_escape_sequence := |* non_set_escape > (escaped_set_alpha, 2) { emit(:escape, :literal, *text(data, ts, te, 1)) fret; }; any > (escaped_set_alpha, 1) { fhold; fnext character_set; fcall escape_sequence; }; *|; # escape sequence scanner # -------------------------------------------------------------------------- escape_sequence := |* [1-9] { text = text(data, ts, te, 1).first emit(:backref, :number, text, ts-1, te) fret; }; octal_sequence { emit(:escape, :octal, *text(data, ts, te, 1)) fret; }; meta_char { case text = text(data, ts, te, 1).first when '\.'; emit(:escape, :dot, text, ts-1, te) when '\|'; emit(:escape, :alternation, text, ts-1, te) when '\^'; emit(:escape, :bol, text, ts-1, te) when '\$'; emit(:escape, :eol, text, ts-1, te) when '\?'; emit(:escape, :zero_or_one, text, ts-1, te) when '\*'; emit(:escape, :zero_or_more, text, ts-1, te) when '\+'; emit(:escape, :one_or_more, text, ts-1, te) when '\('; emit(:escape, :group_open, text, ts-1, te) when '\)'; emit(:escape, :group_close, text, ts-1, te) when '\{'; emit(:escape, :interval_open, text, ts-1, te) when '\}'; emit(:escape, :interval_close, text, ts-1, te) when '\['; emit(:escape, :set_open, text, ts-1, te) when '\]'; emit(:escape, :set_close, text, ts-1, te) when "\\\\"; emit(:escape, :backslash, text, ts-1, te) end fret; }; escaped_ascii > (escaped_alpha, 7) { # \b is emitted as backspace only when inside a character set, otherwise # it is a word boundary anchor. A syntax might "normalize" it if needed. case text = text(data, ts, te, 1).first when '\a'; emit(:escape, :bell, text, ts-1, te) when '\b'; emit(:escape, :backspace, text, ts-1, te) when '\e'; emit(:escape, :escape, text, ts-1, te) when '\f'; emit(:escape, :form_feed, text, ts-1, te) when '\n'; emit(:escape, :newline, text, ts-1, te) when '\r'; emit(:escape, :carriage, text, ts-1, te) when '\t'; emit(:escape, :tab, text, ts-1, te) when '\v'; emit(:escape, :vertical_tab, text, ts-1, te) end fret; }; codepoint_sequence > (escaped_alpha, 6) $eof(premature_end_error) { text = text(data, ts, te, 1).first if text[2].chr == '{' emit(:escape, :codepoint_list, text, ts-1, te) else emit(:escape, :codepoint, text, ts-1, te) end fret; }; hex_sequence > (escaped_alpha, 5) $eof(premature_end_error) { emit(:escape, :hex, *text(data, ts, te, 1)) fret; }; hex_sequence_err @invalid_sequence_error { fret; }; control_sequence >(escaped_alpha, 4) $eof(premature_end_error) { emit_meta_control_sequence(data, ts, te, :control) fret; }; meta_sequence >(backslashed, 3) $eof(premature_end_error) { emit_meta_control_sequence(data, ts, te, :meta_sequence) fret; }; char_type_char > (escaped_alpha, 2) { fhold; fnext *(in_set? ? fentry(character_set) : fentry(main)); fcall char_type; }; property_char > (escaped_alpha, 2) { fhold; fnext *(in_set? ? fentry(character_set) : fentry(main)); fcall unicode_property; }; (any -- non_literal_escape) > (escaped_alpha, 1) { emit(:escape, :literal, *text(data, ts, te, 1)) fret; }; *|; # conditional expressions scanner # -------------------------------------------------------------------------- conditional_expression := |* group_lookup . ')' { text = text(data, ts, te-1).first emit(:conditional, :condition, text, ts, te-1) emit(:conditional, :condition_close, ')', te-1, te) }; any { fhold; fcall main; }; *|; # Main scanner # -------------------------------------------------------------------------- main := |* # Meta characters # ------------------------------------------------------------------------ dot { emit(:meta, :dot, *text(data, ts, te)) }; alternation { if conditional_stack.last == group_depth emit(:conditional, :separator, *text(data, ts, te)) else emit(:meta, :alternation, *text(data, ts, te)) end }; # Anchors # ------------------------------------------------------------------------ beginning_of_line { emit(:anchor, :bol, *text(data, ts, te)) }; end_of_line { emit(:anchor, :eol, *text(data, ts, te)) }; backslash . keep_mark > (backslashed, 4) { emit(:keep, :mark, *text(data, ts, te)) }; backslash . anchor_char > (backslashed, 3) { case text = text(data, ts, te).first when '\\A'; emit(:anchor, :bos, text, ts, te) when '\\z'; emit(:anchor, :eos, text, ts, te) when '\\Z'; emit(:anchor, :eos_ob_eol, text, ts, te) when '\\b'; emit(:anchor, :word_boundary, text, ts, te) when '\\B'; emit(:anchor, :nonword_boundary, text, ts, te) when '\\G'; emit(:anchor, :match_start, text, ts, te) end }; # Character sets # ------------------------------------------------------------------------ set_open >set_opened { emit(:set, :open, *text(data, ts, te)) fcall character_set; }; # Conditional expression # (?(condition)Y|N) conditional expression # ------------------------------------------------------------------------ conditional { text = text(data, ts, te).first conditional_stack << group_depth emit(:conditional, :open, text[0..-2], ts, te-1) emit(:conditional, :condition_open, '(', te-1, te) fcall conditional_expression; }; # (?#...) comments: parsed as a single expression, without introducing a # new nesting level. Comments may not include parentheses, escaped or not. # special case for close, action performed on all transitions to get the # correct closing count. # ------------------------------------------------------------------------ group_open . group_comment $group_closed { emit(:group, :comment, *text(data, ts, te)) }; # Expression options: # (?imxdau-imx) option on/off # i: ignore case # m: multi-line (dot(.) match newline) # x: extended form # d: default class rules (1.9 compatible) # a: ASCII class rules (\s, \w, etc.) # u: Unicode class rules (\s, \w, etc.) # # (?imxdau-imx:subexp) option on/off for subexp # ------------------------------------------------------------------------ group_open . group_options >group_opened { text = text(data, ts, te).first if text[2..-1] =~ /([^\-mixdau:]|^$)|-.*([dau])/ raise InvalidGroupOption.new($1 || "-#{$2}", text) end emit_options(text, ts, te) }; # Assertions # (?=subexp) look-ahead # (?!subexp) negative look-ahead # (?<=subexp) look-behind # (?group_opened { case text = text(data, ts, te).first when '(?='; emit(:assertion, :lookahead, text, ts, te) when '(?!'; emit(:assertion, :nlookahead, text, ts, te) when '(?<='; emit(:assertion, :lookbehind, text, ts, te) when '(?subexp) atomic group, don't backtrack in subexp. # (?~subexp) absence group, matches anything that is not subexp # (?subexp) named group # (?'name'subexp) named group (single quoted version) # (subexp) captured group # ------------------------------------------------------------------------ group_open . group_type >group_opened { case text = text(data, ts, te).first when '(?:'; emit(:group, :passive, text, ts, te) when '(?>'; emit(:group, :atomic, text, ts, te) when '(?~'; emit(:group, :absence, text, ts, te) when /^\(\?(?:<>|'')/ validation_error(:group, 'named group', 'name is empty') when /^\(\?<\w*>/ emit(:group, :named_ab, text, ts, te) when /^\(\?'\w*'/ emit(:group, :named_sq, text, ts, te) end }; group_open @group_opened { text = text(data, ts, te).first emit(:group, :capture, text, ts, te) }; group_close @group_closed { if conditional_stack.last == group_depth + 1 conditional_stack.pop emit(:conditional, :close, *text(data, ts, te)) else if spacing_stack.length > 1 && spacing_stack.last[:depth] == group_depth + 1 spacing_stack.pop self.free_spacing = spacing_stack.last[:free_spacing] end emit(:group, :close, *text(data, ts, te)) end }; # Group backreference, named and numbered # ------------------------------------------------------------------------ backslash . (group_name_ref | group_number_ref) > (backslashed, 4) { case text = text(data, ts, te).first when /^\\([gk])(<>|'')/ # angle brackets validation_error(:backref, 'ref/call', 'ref ID is empty') when /^\\([gk])<[^\d+-]\w*>/ # angle-brackets if $1 == 'k' emit(:backref, :name_ref_ab, text, ts, te) else emit(:backref, :name_call_ab, text, ts, te) end when /^\\([gk])'[^\d+-]\w*'/ #single quotes if $1 == 'k' emit(:backref, :name_ref_sq, text, ts, te) else emit(:backref, :name_call_sq, text, ts, te) end when /^\\([gk])<\d+>/ # angle-brackets if $1 == 'k' emit(:backref, :number_ref_ab, text, ts, te) else emit(:backref, :number_call_ab, text, ts, te) end when /^\\([gk])'\d+'/ # single quotes if $1 == 'k' emit(:backref, :number_ref_sq, text, ts, te) else emit(:backref, :number_call_sq, text, ts, te) end when /^\\(?:g<\+|g<-|(k)<-)\d+>/ # angle-brackets if $1 == 'k' emit(:backref, :number_rel_ref_ab, text, ts, te) else emit(:backref, :number_rel_call_ab, text, ts, te) end when /^\\(?:g'\+|g'-|(k)'-)\d+'/ # single quotes if $1 == 'k' emit(:backref, :number_rel_ref_sq, text, ts, te) else emit(:backref, :number_rel_call_sq, text, ts, te) end when /^\\k<[^\d+\-]\w*[+\-]\d+>/ # angle-brackets emit(:backref, :name_recursion_ref_ab, text, ts, te) when /^\\k'[^\d+\-]\w*[+\-]\d+'/ # single-quotes emit(:backref, :name_recursion_ref_sq, text, ts, te) when /^\\([gk])<[+\-]?\d+[+\-]\d+>/ # angle-brackets emit(:backref, :number_recursion_ref_ab, text, ts, te) when /^\\([gk])'[+\-]?\d+[+\-]\d+'/ # single-quotes emit(:backref, :number_recursion_ref_sq, text, ts, te) end }; # Quantifiers # ------------------------------------------------------------------------ zero_or_one { case text = text(data, ts, te).first when '?' ; emit(:quantifier, :zero_or_one, text, ts, te) when '??'; emit(:quantifier, :zero_or_one_reluctant, text, ts, te) when '?+'; emit(:quantifier, :zero_or_one_possessive, text, ts, te) end }; zero_or_more { case text = text(data, ts, te).first when '*' ; emit(:quantifier, :zero_or_more, text, ts, te) when '*?'; emit(:quantifier, :zero_or_more_reluctant, text, ts, te) when '*+'; emit(:quantifier, :zero_or_more_possessive, text, ts, te) end }; one_or_more { case text = text(data, ts, te).first when '+' ; emit(:quantifier, :one_or_more, text, ts, te) when '+?'; emit(:quantifier, :one_or_more_reluctant, text, ts, te) when '++'; emit(:quantifier, :one_or_more_possessive, text, ts, te) end }; quantifier_interval @err(premature_end_error) { emit(:quantifier, :interval, *text(data, ts, te)) }; # Escaped sequences # ------------------------------------------------------------------------ backslash > (backslashed, 1) { fcall escape_sequence; }; comment { if free_spacing emit(:free_space, :comment, *text(data, ts, te)) else append_literal(data, ts, te) end }; space+ { if free_spacing emit(:free_space, :whitespace, *text(data, ts, te)) else append_literal(data, ts, te) end }; # Literal: any run of ASCII (pritable or non-printable), and/or UTF-8, # except meta characters. # ------------------------------------------------------------------------ (ascii_print -- space)+ | ascii_nonprint+ | utf8_2_byte+ | utf8_3_byte+ | utf8_4_byte+ { append_literal(data, ts, te) }; *|; }%% # THIS IS A GENERATED FILE, DO NOT EDIT DIRECTLY # This file was generated from lib/regexp_parser/scanner/scanner.rl class Regexp::Scanner # General scanner error (catch all) class ScannerError < StandardError; end # Base for all scanner validation errors class ValidationError < StandardError def initialize(reason) super reason end end # Unexpected end of pattern class PrematureEndError < ScannerError def initialize(where = '') super "Premature end of pattern at #{where}" end end # Invalid sequence format. Used for escape sequences, mainly. class InvalidSequenceError < ValidationError def initialize(what = 'sequence', where = '') super "Invalid #{what} at #{where}" end end # Invalid group. Used for named groups. class InvalidGroupError < ValidationError def initialize(what, reason) super "Invalid #{what}, #{reason}." end end # Invalid groupOption. Used for inline options. class InvalidGroupOption < ValidationError def initialize(option, text) super "Invalid group option #{option} in #{text}" end end # Invalid back reference. Used for name a number refs/calls. class InvalidBackrefError < ValidationError def initialize(what, reason) super "Invalid back reference #{what}, #{reason}" end end # The property name was not recognized by the scanner. class UnknownUnicodePropertyError < ValidationError def initialize(name) super "Unknown unicode character property name #{name}" end end # Scans the given regular expression text, or Regexp object and collects the # emitted token into an array that gets returned at the end. If a block is # given, it gets called for each emitted token. # # This method may raise errors if a syntax error is encountered. # -------------------------------------------------------------------------- def self.scan(input_object, &block) new.scan(input_object, &block) end def scan(input_object, &block) self.literal = nil stack = [] if input_object.is_a?(Regexp) input = input_object.source self.free_spacing = (input_object.options & Regexp::EXTENDED != 0) else input = input_object self.free_spacing = false end self.spacing_stack = [{:free_spacing => free_spacing, :depth => 0}] data = input.unpack("c*") if input.is_a?(String) eof = data.length self.tokens = [] self.block = block_given? ? block : nil self.set_depth = 0 self.group_depth = 0 self.conditional_stack = [] %% write data; %% write init; %% write exec; # to avoid "warning: assigned but unused variable - testEof" testEof = testEof if cs == re_scanner_error text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise ScannerError.new("Scan error at '#{text}'") end raise PrematureEndError.new("(missing group closing paranthesis) "+ "[#{group_depth}]") if in_group? raise PrematureEndError.new("(missing set closing bracket) "+ "[#{set_depth}]") if in_set? # when the entire expression is a literal run emit_literal if literal tokens end # lazy-load property maps when first needed require 'yaml' PROP_MAPS_DIR = File.expand_path('../scanner/properties', __FILE__) def self.short_prop_map @short_prop_map ||= YAML.load_file("#{PROP_MAPS_DIR}/short.yml") end def self.long_prop_map @long_prop_map ||= YAML.load_file("#{PROP_MAPS_DIR}/long.yml") end # Emits an array with the details of the scanned pattern def emit(type, token, text, ts, te) #puts "EMIT: type: #{type}, token: #{token}, text: #{text}, ts: #{ts}, te: #{te}" emit_literal if literal if block block.call type, token, text, ts, te end tokens << [type, token, text, ts, te] end private attr_accessor :tokens, :literal, :block, :free_spacing, :spacing_stack, :group_depth, :set_depth, :conditional_stack def in_group? group_depth > 0 end def in_set? set_depth > 0 end # Copy from ts to te from data as text def copy(data, range) data[range].pack('c*') end # Copy from ts to te from data as text, returning an array with the text # and the offsets used to copy it. def text(data, ts, te, soff = 0) [copy(data, ts-soff..te-1), ts-soff, te] end # Appends one or more characters to the literal buffer, to be emitted later # by a call to emit_literal. Contents can be a mix of ASCII and UTF-8. def append_literal(data, ts, te) self.literal = literal || [] literal << text(data, ts, te) end # Emits the literal run collected by calls to the append_literal method, # using the total start (ts) and end (te) offsets of the run. def emit_literal ts, te = literal.first[1], literal.last[2] text = literal.map {|t| t[0]}.join text.force_encoding('utf-8') if text.respond_to?(:force_encoding) self.literal = nil emit(:literal, :literal, text, ts, te) end def emit_options(text, ts, te) token = nil # Ruby allows things like '(?-xxxx)' or '(?xx-xx--xx-:abc)'. text =~ /\(\?([mixdau]*)(-(?:[mix]*))*(:)?/ positive, negative, group_local = $1, $2, $3 if positive.include?('x') self.free_spacing = true end # If the x appears in both, treat it like ruby does, the second cancels # the first. if negative && negative.include?('x') self.free_spacing = false end if group_local spacing_stack << {:free_spacing => free_spacing, :depth => group_depth} token = :options else # switch for parent group level spacing_stack.last[:free_spacing] = free_spacing token = :options_switch end emit(:group, token, text, ts, te) end def emit_meta_control_sequence(data, ts, te, token) if data.last < 0x00 || data.last > 0x7F validation_error(:sequence, 'escape', token.to_s) end emit(:escape, token, *text(data, ts, te, 1)) end # Centralizes and unifies the handling of validation related # errors. def validation_error(type, what, reason) case type when :group error = InvalidGroupError.new(what, reason) when :backref error = InvalidBackrefError.new(what, reason) when :sequence error = InvalidSequenceError.new(what, reason) end raise error # unless @@config.validation_ignore end end # module Regexp::Scanner regexp_parser-1.6.0/lib/regexp_parser/scanner/property.rl0000644000004100000410000000150213541126475023712 0ustar www-datawww-data%%{ machine re_property; property_char = [pP]; property_sequence = property_char . '{' . '^'? (alnum|space|[_\-\.=])+ '}'; action premature_property_end { raise PrematureEndError.new('unicode property') } # Unicode properties scanner # -------------------------------------------------------------------------- unicode_property := |* property_sequence < eof(premature_property_end) { text = text(data, ts, te, 1).first type = (text[1] == 'P') ^ (text[3] == '^') ? :nonproperty : :property name = data[ts+2..te-2].pack('c*').gsub(/[\^\s_\-]/, '').downcase token = self.class.short_prop_map[name] || self.class.long_prop_map[name] raise UnknownUnicodePropertyError.new(name) unless token self.emit(type, token.to_sym, text, ts-1, te) fret; }; *|; }%% regexp_parser-1.6.0/lib/regexp_parser/syntax/0000755000004100000410000000000013541126476021367 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/syntax/versions/0000755000004100000410000000000013541126476023237 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.5.0.rb0000644000004100000410000000033413541126476024226 0ustar www-datawww-datamodule Regexp::Syntax class V2_5_0 < Regexp::Syntax::V2_4 def initialize super implements :property, UnicodeProperty::V2_5_0 implements :nonproperty, UnicodeProperty::V2_5_0 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.4.0.rb0000644000004100000410000000033413541126476024225 0ustar www-datawww-datamodule Regexp::Syntax class V2_4_0 < Regexp::Syntax::V2_3 def initialize super implements :property, UnicodeProperty::V2_4_0 implements :nonproperty, UnicodeProperty::V2_4_0 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.6.3.rb0000644000004100000410000000033613541126476024234 0ustar www-datawww-datamodule Regexp::Syntax class V2_6_3 < Regexp::Syntax::V2_6_2 def initialize super implements :property, UnicodeProperty::V2_6_3 implements :nonproperty, UnicodeProperty::V2_6_3 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.4.1.rb0000644000004100000410000000022713541126476024227 0ustar www-datawww-datamodule Regexp::Syntax class V2_4_1 < Regexp::Syntax::V2_4_0 def initialize super implements :group, Group::V2_4_1 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.0.0.rb0000644000004100000410000000074713541126476024231 0ustar www-datawww-datamodule Regexp::Syntax # use the last 1.9 release as the base class V2_0_0 < Regexp::Syntax::V1_9 def initialize super implements :keep, Keep::All implements :conditional, Conditional::All implements :property, UnicodeProperty::V2_0_0 implements :nonproperty, UnicodeProperty::V2_0_0 implements :type, CharacterType::Clustered excludes :property, :newline excludes :nonproperty, :newline end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/1.9.1.rb0000644000004100000410000000121413541126476024230 0ustar www-datawww-datamodule Regexp::Syntax class V1_9_1 < Regexp::Syntax::V1_8_6 def initialize super implements :assertion, Assertion::Lookbehind implements :backref, Backreference::All + SubexpressionCall::All implements :posixclass, PosixClass::Extensions implements :nonposixclass, PosixClass::All implements :escape, Escape::Unicode + Escape::Hex + Escape::Octal implements :type, CharacterType::Hex implements :property, UnicodeProperty::V1_9_0 implements :nonproperty, UnicodeProperty::V1_9_0 implements :quantifier, Quantifier::Possessive + Quantifier::IntervalPossessive end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/1.8.6.rb0000644000004100000410000000126313541126476024240 0ustar www-datawww-datamodule Regexp::Syntax class V1_8_6 < Regexp::Syntax::Base def initialize super implements :anchor, Anchor::All implements :assertion, Assertion::Lookahead implements :backref, [:number] implements :posixclass, PosixClass::Standard implements :group, Group::All implements :meta, Meta::Extended implements :set, CharacterSet::All implements :type, CharacterType::Extended implements :escape, Escape::Basic + Escape::ASCII + Escape::Meta + Escape::Control implements :quantifier, Quantifier::Greedy + Quantifier::Reluctant + Quantifier::Interval + Quantifier::IntervalReluctant end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.6.0.rb0000644000004100000410000000033413541126476024227 0ustar www-datawww-datamodule Regexp::Syntax class V2_6_0 < Regexp::Syntax::V2_5 def initialize super implements :property, UnicodeProperty::V2_6_0 implements :nonproperty, UnicodeProperty::V2_6_0 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.3.0.rb0000644000004100000410000000033413541126476024224 0ustar www-datawww-datamodule Regexp::Syntax class V2_3_0 < Regexp::Syntax::V2_2 def initialize super implements :property, UnicodeProperty::V2_3_0 implements :nonproperty, UnicodeProperty::V2_3_0 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.6.2.rb0000644000004100000410000000033613541126476024233 0ustar www-datawww-datamodule Regexp::Syntax class V2_6_2 < Regexp::Syntax::V2_6_0 def initialize super implements :property, UnicodeProperty::V2_6_2 implements :nonproperty, UnicodeProperty::V2_6_2 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/1.9.3.rb0000644000004100000410000000043713541126476024240 0ustar www-datawww-datamodule Regexp::Syntax class V1_9_3 < Regexp::Syntax::V1_9_1 def initialize super # these were added with update of Oniguruma to Unicode 6.0 implements :property, UnicodeProperty::V1_9_3 implements :nonproperty, UnicodeProperty::V1_9_3 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions/2.2.0.rb0000644000004100000410000000033413541126476024223 0ustar www-datawww-datamodule Regexp::Syntax class V2_2_0 < Regexp::Syntax::V2_1 def initialize super implements :property, UnicodeProperty::V2_2_0 implements :nonproperty, UnicodeProperty::V2_2_0 end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens.rb0000644000004100000410000000241113541126475023214 0ustar www-datawww-data# Define the base module and the simplest of tokens. module Regexp::Syntax module Token Map = {} module Literal All = [:literal] Type = :literal end module FreeSpace All = [:comment, :whitespace] Type = :free_space end Map[FreeSpace::Type] = FreeSpace::All Map[Literal::Type] = Literal::All end end # Load all the token files, they will populate the Map constant. require 'regexp_parser/syntax/tokens/anchor' require 'regexp_parser/syntax/tokens/assertion' require 'regexp_parser/syntax/tokens/backref' require 'regexp_parser/syntax/tokens/posix_class' require 'regexp_parser/syntax/tokens/character_set' require 'regexp_parser/syntax/tokens/character_type' require 'regexp_parser/syntax/tokens/conditional' require 'regexp_parser/syntax/tokens/escape' require 'regexp_parser/syntax/tokens/group' require 'regexp_parser/syntax/tokens/keep' require 'regexp_parser/syntax/tokens/meta' require 'regexp_parser/syntax/tokens/quantifier' require 'regexp_parser/syntax/tokens/unicode_property' # After loading all the tokens the map is full. Extract all tokens and types # into the All and Types constants. module Regexp::Syntax module Token All = Map.values.flatten.uniq.sort.freeze Types = Map.keys.freeze end end regexp_parser-1.6.0/lib/regexp_parser/syntax/any.rb0000644000004100000410000000060613541126475022504 0ustar www-datawww-datamodule Regexp::Syntax # A syntax that always returns true, passing all tokens as implemented. This # is useful during development, testing, and should be useful for some types # of transformations as well. class Any < Base def initialize @implements = { :* => [:*] } end def implements?(type, token) true end def implements!(type, token) true end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/base.rb0000644000004100000410000000445213541126475022632 0ustar www-datawww-datarequire 'set' module Regexp::Syntax class NotImplementedError < SyntaxError def initialize(syntax, type, token) super "#{syntax.class.name} does not implement: [#{type}:#{token}]" end end # A lookup map of supported types and tokens in a given syntax class Base include Regexp::Syntax::Token def initialize @implements = {} implements Token::Literal::Type, Token::Literal::All implements Token::FreeSpace::Type, Token::FreeSpace::All end def features @implements end def implementations(type) @implements[type] ||= Set.new end def implements(type, tokens) implementations(type).merge(Array(tokens)) end def excludes(type, tokens) implementations(type).subtract(Array(tokens)) end def implements?(type, token) implementations(type).include?(token) end alias :check? :implements? def implements!(type, token) raise NotImplementedError.new(self, type, token) unless implements?(type, token) end alias :check! :implements! def normalize(type, token) case type when :group normalize_group(type, token) when :backref normalize_backref(type, token) else [type, token] end end def normalize_group(type, token) case token when :named_ab, :named_sq [:group, :named] else [type, token] end end def normalize_backref(type, token) case token when :name_ref_ab, :name_ref_sq [:backref, :name_ref] when :name_call_ab, :name_call_sq [:backref, :name_call] when :name_recursion_ref_ab, :name_recursion_ref_sq [:backref, :name_recursion_ref] when :number_ref_ab, :number_ref_sq [:backref, :number_ref] when :number_call_ab, :number_call_sq [:backref, :number_call] when :number_rel_ref_ab, :number_rel_ref_sq [:backref, :number_rel_ref] when :number_rel_call_ab, :number_rel_call_sq [:backref, :number_rel_call] when :number_recursion_ref_ab, :number_recursion_ref_sq [:backref, :number_recursion_ref] else [type, token] end end def self.inspect "#{super} (feature set of #{ancestors[1].to_s.split('::').last})" end end end regexp_parser-1.6.0/lib/regexp_parser/syntax/version_lookup.rb0000644000004100000410000000503413541126476024774 0ustar www-datawww-datamodule Regexp::Syntax VERSION_FORMAT = '\Aruby/\d+\.\d+(\.\d+)?\z' VERSION_REGEXP = /#{VERSION_FORMAT}/ VERSION_CONST_REGEXP = /\AV\d+_\d+(?:_\d+)?\z/ class InvalidVersionNameError < SyntaxError def initialize(name) super "Invalid version name '#{name}'. Expected format is '#{VERSION_FORMAT}'" end end class UnknownSyntaxNameError < SyntaxError def initialize(name) super "Unknown syntax name '#{name}'." end end module_function # Loads and instantiates an instance of the syntax specification class for # the given syntax version name. The special names 'any' and '*' return an # instance of Syntax::Any. def new(name) return Regexp::Syntax::Any.new if ['*', 'any'].include?(name.to_s) version_class(name).new end def supported?(name) name =~ VERSION_REGEXP && comparable_version(name) >= comparable_version('1.8.6') end def version_class(version) version =~ VERSION_REGEXP || raise(InvalidVersionNameError, version) version_const_name = version_const_name(version) const_get(version_const_name) || raise(UnknownSyntaxNameError, version) end def version_const_name(version_string) "V#{version_string.to_s.scan(/\d+/).join('_')}" end def const_missing(const_name) if const_name =~ VERSION_CONST_REGEXP return fallback_version_class(const_name) end super end def fallback_version_class(version) sorted_versions = (specified_versions + [version]) .sort_by { |name| comparable_version(name) } return if (version_index = sorted_versions.index(version)) < 1 next_lower_version = sorted_versions[version_index - 1] inherit_from_version(next_lower_version, version) end def inherit_from_version(parent_version, new_version) new_const = version_const_name(new_version) parent = const_get(version_const_name(parent_version)) const_defined?(new_const) || const_set(new_const, Class.new(parent)) warn_if_future_version(new_const) const_get(new_const) end def specified_versions constants.select { |const_name| const_name =~ VERSION_CONST_REGEXP } end def comparable_version(name) # add .99 to treat versions without a patch value as latest patch version Gem::Version.new((name.to_s.scan(/\d+/) << 99).join('.')) end def warn_if_future_version(const_name) return if comparable_version(const_name) < comparable_version('3.0.0') warn('This library has only been tested up to Ruby 2.x, '\ "but you are running with #{const_get(const_name).inspect}") end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/0000755000004100000410000000000013541126475022671 5ustar www-datawww-dataregexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/conditional.rb0000644000004100000410000000056613541126475025530 0ustar www-datawww-datamodule Regexp::Syntax module Token module Conditional Delimiters = [:open, :close] Condition = [:condition_open, :condition, :condition_close] Separator = [:separator] All = Conditional::Delimiters + Conditional::Condition + Conditional::Separator Type = :conditional end Map[Conditional::Type] = Conditional::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/group.rb0000644000004100000410000000076413541126475024361 0ustar www-datawww-datamodule Regexp::Syntax module Token module Group Basic = [:capture, :close] Extended = Basic + [:options, :options_switch] Named = [:named] Atomic = [:atomic] Passive = [:passive] Comment = [:comment] V1_8_6 = Group::Extended + Group::Named + Group::Atomic + Group::Passive + Group::Comment V2_4_1 = [:absence] All = V1_8_6 + V2_4_1 Type = :group end Map[Group::Type] = Group::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/character_set.rb0000644000004100000410000000040513541126475026024 0ustar www-datawww-datamodule Regexp::Syntax module Token module CharacterSet Basic = [:open, :close, :negate, :range] Extended = Basic + [:intersection] All = Extended Type = :set end Map[CharacterSet::Type] = CharacterSet::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/escape.rb0000644000004100000410000000134213541126475024456 0ustar www-datawww-datamodule Regexp::Syntax module Token module Escape Basic = [:backslash, :literal] Control = [:control, :meta_sequence] ASCII = [:bell, :backspace, :escape, :form_feed, :newline, :carriage, :tab, :vertical_tab] Unicode = [:codepoint, :codepoint_list] Meta = [:dot, :alternation, :zero_or_one, :zero_or_more, :one_or_more, :bol, :eol, :group_open, :group_close, :interval_open, :interval_close, :set_open, :set_close] Hex = [:hex] Octal = [:octal] All = Basic + Control + ASCII + Unicode + Meta + Hex + Octal Type = :escape end Map[Escape::Type] = Escape::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/assertion.rb0000644000004100000410000000041613541126475025226 0ustar www-datawww-datamodule Regexp::Syntax module Token module Assertion Lookahead = [:lookahead, :nlookahead] Lookbehind = [:lookbehind, :nlookbehind] All = Lookahead + Lookbehind Type = :assertion end Map[Assertion::Type] = Assertion::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/unicode_property.rb0000644000004100000410000004122113541126476026611 0ustar www-datawww-datamodule Regexp::Syntax module Token module UnicodeProperty all = proc { |name| constants.grep(/#{name}/).flat_map(&method(:const_get)) } CharType_V1_9_0 = [:alnum, :alpha, :ascii, :blank, :cntrl, :digit, :graph, :lower, :print, :punct, :space, :upper, :word, :xdigit] CharType_V2_5_0 = [:xposixpunct] POSIX = [:any, :assigned, :newline] module Category Letter = [:letter, :uppercase_letter, :lowercase_letter, :titlecase_letter, :modifier_letter, :other_letter] Mark = [:mark, :nonspacing_mark, :spacing_mark, :enclosing_mark] Number = [:number, :decimal_number, :letter_number, :other_number] Punctuation = [:punctuation, :connector_punctuation, :dash_punctuation, :open_punctuation, :close_punctuation, :initial_punctuation, :final_punctuation, :other_punctuation] Symbol = [:symbol, :math_symbol, :currency_symbol, :modifier_symbol, :other_symbol] Separator = [:separator, :space_separator, :line_separator, :paragraph_separator] Codepoint = [:other, :control, :format, :surrogate, :private_use, :unassigned] All = Letter + Mark + Number + Punctuation + Symbol + Separator + Codepoint end Age_V1_9_3 = [:'age=1.1', :'age=2.0', :'age=2.1', :'age=3.0', :'age=3.1', :'age=3.2', :'age=4.0', :'age=4.1', :'age=5.0', :'age=5.1', :'age=5.2', :'age=6.0'] Age_V2_0_0 = [:'age=6.1'] Age_V2_2_0 = [:'age=6.2', :'age=6.3', :'age=7.0'] Age_V2_3_0 = [:'age=8.0'] Age_V2_4_0 = [:'age=9.0'] Age_V2_5_0 = [:'age=10.0'] Age_V2_6_0 = [:'age=11.0'] Age_V2_6_2 = [:'age=12.0'] Age_V2_6_3 = [:'age=12.1'] Age = all[:Age_V] Derived_V1_9_0 = [ :ascii_hex_digit, :alphabetic, :cased, :changes_when_casefolded, :changes_when_casemapped, :changes_when_lowercased, :changes_when_titlecased, :changes_when_uppercased, :case_ignorable, :bidi_control, :dash, :deprecated, :default_ignorable_code_point, :diacritic, :extender, :grapheme_base, :grapheme_extend, :grapheme_link, :hex_digit, :hyphen, :id_continue, :ideographic, :id_start, :ids_binary_operator, :ids_trinary_operator, :join_control, :logical_order_exception, :lowercase, :math, :noncharacter_code_point, :other_alphabetic, :other_default_ignorable_code_point, :other_grapheme_extend, :other_id_continue, :other_id_start, :other_lowercase, :other_math, :other_uppercase, :pattern_syntax, :pattern_white_space, :quotation_mark, :radical, :sentence_terminal, :soft_dotted, :terminal_punctuation, :unified_ideograph, :uppercase, :variation_selector, :white_space, :xid_start, :xid_continue, ] Derived_V2_0_0 = [ :cased_letter, :combining_mark, ] Derived_V2_4_0 = [ :prepended_concatenation_mark, ] Derived_V2_5_0 = [ :regional_indicator ] Derived = all[:Derived_V] Script_V1_9_0 = [ :arabic, :imperial_aramaic, :armenian, :avestan, :balinese, :bamum, :bengali, :bopomofo, :braille, :buginese, :buhid, :canadian_aboriginal, :carian, :cham, :cherokee, :coptic, :cypriot, :cyrillic, :devanagari, :deseret, :egyptian_hieroglyphs, :ethiopic, :georgian, :glagolitic, :gothic, :greek, :gujarati, :gurmukhi, :hangul, :han, :hanunoo, :hebrew, :hiragana, :old_italic, :javanese, :kayah_li, :katakana, :kharoshthi, :khmer, :kannada, :kaithi, :tai_tham, :lao, :latin, :lepcha, :limbu, :linear_b, :lisu, :lycian, :lydian, :malayalam, :mongolian, :meetei_mayek, :myanmar, :nko, :ogham, :ol_chiki, :old_turkic, :oriya, :osmanya, :phags_pa, :inscriptional_pahlavi, :phoenician, :inscriptional_parthian, :rejang, :runic, :samaritan, :old_south_arabian, :saurashtra, :shavian, :sinhala, :sundanese, :syloti_nagri, :syriac, :tagbanwa, :tai_le, :new_tai_lue, :tamil, :tai_viet, :telugu, :tifinagh, :tagalog, :thaana, :thai, :tibetan, :ugaritic, :vai, :old_persian, :cuneiform, :yi, :inherited, :common, :unknown ] Script_V1_9_3 = [ :brahmi, :batak, :mandaic ] Script_V2_0_0 = [ :chakma, :meroitic_cursive, :meroitic_hieroglyphs, :miao, :sharada, :sora_sompeng, :takri, ] Script_V2_2_0 = [ :caucasian_albanian, :bassa_vah, :duployan, :elbasan, :grantha, :pahawh_hmong, :khojki, :linear_a, :mahajani, :manichaean, :mende_kikakui, :modi, :mro, :old_north_arabian, :nabataean, :palmyrene, :pau_cin_hau, :old_permic, :psalter_pahlavi, :siddham, :khudawadi, :tirhuta, :warang_citi ] Script_V2_3_0 = [ :ahom, :anatolian_hieroglyphs, :hatran, :multani, :old_hungarian, :signwriting, ] Script_V2_4_0 = [ :adlam, :bhaiksuki, :marchen, :newa, :osage, :tangut, ] Script_V2_5_0 = [ :masaram_gondi, :nushu, :soyombo, :zanabazar_square, ] Script_V2_6_0 = [ :dogra, :gunjala_gondi, :hanifi_rohingya, :makasar, :medefaidrin, :old_sogdian, :sogdian, ] Script_V2_6_2 = [ :egyptian_hieroglyph_format_controls, :elymaic, :nandinagari, :nyiakeng_puachue_hmong, :ottoman_siyaq_numbers, :small_kana_extension, :symbols_and_pictographs_extended_a, :tamil_supplement, :wancho, ] Script = all[:Script_V] UnicodeBlock_V1_9_0 = [ :in_alphabetic_presentation_forms, :in_arabic, :in_armenian, :in_arrows, :in_basic_latin, :in_bengali, :in_block_elements, :in_bopomofo_extended, :in_bopomofo, :in_box_drawing, :in_braille_patterns, :in_buhid, :in_cjk_compatibility_forms, :in_cjk_compatibility_ideographs, :in_cjk_compatibility, :in_cjk_radicals_supplement, :in_cjk_symbols_and_punctuation, :in_cjk_unified_ideographs_extension_a, :in_cjk_unified_ideographs, :in_cherokee, :in_combining_diacritical_marks_for_symbols, :in_combining_diacritical_marks, :in_combining_half_marks, :in_control_pictures, :in_currency_symbols, :in_cyrillic_supplement, :in_cyrillic, :in_devanagari, :in_dingbats, :in_enclosed_alphanumerics, :in_enclosed_cjk_letters_and_months, :in_ethiopic, :in_general_punctuation, :in_geometric_shapes, :in_georgian, :in_greek_extended, :in_greek_and_coptic, :in_gujarati, :in_gurmukhi, :in_halfwidth_and_fullwidth_forms, :in_hangul_compatibility_jamo, :in_hangul_jamo, :in_hangul_syllables, :in_hanunoo, :in_hebrew, :in_high_private_use_surrogates, :in_high_surrogates, :in_hiragana, :in_ipa_extensions, :in_ideographic_description_characters, :in_kanbun, :in_kangxi_radicals, :in_kannada, :in_katakana_phonetic_extensions, :in_katakana, :in_khmer_symbols, :in_khmer, :in_lao, :in_latin_extended_additional, :in_letterlike_symbols, :in_limbu, :in_low_surrogates, :in_malayalam, :in_mathematical_operators, :in_miscellaneous_symbols_and_arrows, :in_miscellaneous_symbols, :in_miscellaneous_technical, :in_mongolian, :in_myanmar, :in_number_forms, :in_ogham, :in_optical_character_recognition, :in_oriya, :in_phonetic_extensions, :in_private_use_area, :in_runic, :in_sinhala, :in_small_form_variants, :in_spacing_modifier_letters, :in_specials, :in_superscripts_and_subscripts, :in_supplemental_mathematical_operators, :in_syriac, :in_tagalog, :in_tagbanwa, :in_tai_le, :in_tamil, :in_telugu, :in_thaana, :in_thai, :in_tibetan, :in_unified_canadian_aboriginal_syllabics, :in_variation_selectors, :in_yi_radicals, :in_yi_syllables, :in_yijing_hexagram_symbols, ] UnicodeBlock_V2_0_0 = [ :in_aegean_numbers, :in_alchemical_symbols, :in_ancient_greek_musical_notation, :in_ancient_greek_numbers, :in_ancient_symbols, :in_arabic_extended_a, :in_arabic_mathematical_alphabetic_symbols, :in_arabic_presentation_forms_a, :in_arabic_presentation_forms_b, :in_arabic_supplement, :in_avestan, :in_balinese, :in_bamum, :in_bamum_supplement, :in_batak, :in_brahmi, :in_buginese, :in_byzantine_musical_symbols, :in_cjk_compatibility_ideographs_supplement, :in_cjk_strokes, :in_cjk_unified_ideographs_extension_b, :in_cjk_unified_ideographs_extension_c, :in_cjk_unified_ideographs_extension_d, :in_carian, :in_chakma, :in_cham, :in_combining_diacritical_marks_supplement, :in_common_indic_number_forms, :in_coptic, :in_counting_rod_numerals, :in_cuneiform, :in_cuneiform_numbers_and_punctuation, :in_cypriot_syllabary, :in_cyrillic_extended_a, :in_cyrillic_extended_b, :in_deseret, :in_devanagari_extended, :in_domino_tiles, :in_egyptian_hieroglyphs, :in_emoticons, :in_enclosed_alphanumeric_supplement, :in_enclosed_ideographic_supplement, :in_ethiopic_extended, :in_ethiopic_extended_a, :in_ethiopic_supplement, :in_georgian_supplement, :in_glagolitic, :in_gothic, :in_hangul_jamo_extended_a, :in_hangul_jamo_extended_b, :in_imperial_aramaic, :in_inscriptional_pahlavi, :in_inscriptional_parthian, :in_javanese, :in_kaithi, :in_kana_supplement, :in_kayah_li, :in_kharoshthi, :in_latin_1_supplement, :in_latin_extended_a, :in_latin_extended_b, :in_latin_extended_c, :in_latin_extended_d, :in_lepcha, :in_linear_b_ideograms, :in_linear_b_syllabary, :in_lisu, :in_lycian, :in_lydian, :in_mahjong_tiles, :in_mandaic, :in_mathematical_alphanumeric_symbols, :in_meetei_mayek, :in_meetei_mayek_extensions, :in_meroitic_cursive, :in_meroitic_hieroglyphs, :in_miao, :in_miscellaneous_mathematical_symbols_a, :in_miscellaneous_mathematical_symbols_b, :in_miscellaneous_symbols_and_pictographs, :in_modifier_tone_letters, :in_musical_symbols, :in_myanmar_extended_a, :in_nko, :in_new_tai_lue, :in_no_block, :in_ol_chiki, :in_old_italic, :in_old_persian, :in_old_south_arabian, :in_old_turkic, :in_osmanya, :in_phags_pa, :in_phaistos_disc, :in_phoenician, :in_phonetic_extensions_supplement, :in_playing_cards, :in_rejang, :in_rumi_numeral_symbols, :in_samaritan, :in_saurashtra, :in_sharada, :in_shavian, :in_sora_sompeng, :in_sundanese, :in_sundanese_supplement, :in_supplemental_arrows_a, :in_supplemental_arrows_b, :in_supplemental_punctuation, :in_supplementary_private_use_area_a, :in_supplementary_private_use_area_b, :in_syloti_nagri, :in_tags, :in_tai_tham, :in_tai_viet, :in_tai_xuan_jing_symbols, :in_takri, :in_tifinagh, :in_transport_and_map_symbols, :in_ugaritic, :in_unified_canadian_aboriginal_syllabics_extended, :in_vai, :in_variation_selectors_supplement, :in_vedic_extensions, :in_vertical_forms, ] UnicodeBlock_V2_2_0 = [ :in_bassa_vah, :in_caucasian_albanian, :in_combining_diacritical_marks_extended, :in_coptic_epact_numbers, :in_duployan, :in_elbasan, :in_geometric_shapes_extended, :in_grantha, :in_khojki, :in_khudawadi, :in_latin_extended_e, :in_linear_a, :in_mahajani, :in_manichaean, :in_mende_kikakui, :in_modi, :in_mro, :in_myanmar_extended_b, :in_nabataean, :in_old_north_arabian, :in_old_permic, :in_ornamental_dingbats, :in_pahawh_hmong, :in_palmyrene, :in_pau_cin_hau, :in_psalter_pahlavi, :in_shorthand_format_controls, :in_siddham, :in_sinhala_archaic_numbers, :in_supplemental_arrows_c, :in_tirhuta, :in_warang_citi, ] UnicodeBlock_V2_3_0 = [ :in_ahom, :in_anatolian_hieroglyphs, :in_cjk_unified_ideographs_extension_e, :in_cherokee_supplement, :in_early_dynastic_cuneiform, :in_hatran, :in_multani, :in_old_hungarian, :in_supplemental_symbols_and_pictographs, :in_sutton_signwriting, ] UnicodeBlock_V2_4_0 = [ :in_adlam, :in_bhaiksuki, :in_cyrillic_extended_c, :in_glagolitic_supplement, :in_ideographic_symbols_and_punctuation, :in_marchen, :in_mongolian_supplement, :in_newa, :in_osage, :in_tangut, :in_tangut_components, ] UnicodeBlock_V2_5_0 = [ :in_cjk_unified_ideographs_extension_f, :in_kana_extended_a, :in_masaram_gondi, :in_nushu, :in_soyombo, :in_syriac_supplement, :in_zanabazar_square, ] UnicodeBlock_V2_6_0 = [ :in_chess_symbols, :in_dogra, :in_georgian_extended, :in_gunjala_gondi, :in_hanifi_rohingya, :in_indic_siyaq_numbers, :in_makasar, :in_mayan_numerals, :in_medefaidrin, :in_old_sogdian, :in_sogdian, ] UnicodeBlock_V2_6_2 = [ :in_egyptian_hieroglyph_format_controls, :in_elymaic, :in_nandinagari, :in_nyiakeng_puachue_hmong, :in_ottoman_siyaq_numbers, :in_small_kana_extension, :in_symbols_and_pictographs_extended_a, :in_tamil_supplement, :in_wancho, ] UnicodeBlock = all[:UnicodeBlock_V] Emoji_V2_5_0 = [ :emoji, :emoji_component, :emoji_modifier, :emoji_modifier_base, :emoji_presentation, ] Emoji = all[:Emoji_V] V1_9_0 = Category::All + POSIX + all[:V1_9_0] V1_9_3 = all[:V1_9_3] V2_0_0 = all[:V2_0_0] V2_2_0 = all[:V2_2_0] V2_3_0 = all[:V2_3_0] V2_4_0 = all[:V2_4_0] V2_5_0 = all[:V2_5_0] V2_6_0 = all[:V2_6_0] V2_6_2 = all[:V2_6_2] V2_6_3 = all[:V2_6_3] All = all[/^V\d+_\d+_\d+$/] Type = :property NonType = :nonproperty end Map[UnicodeProperty::Type] = UnicodeProperty::All Map[UnicodeProperty::NonType] = UnicodeProperty::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/backref.rb0000644000004100000410000000116213541126475024613 0ustar www-datawww-datamodule Regexp::Syntax module Token module Backreference Name = [:name_ref] Number = [:number, :number_ref, :number_rel_ref] RecursionLevel = [:name_recursion_ref, :number_recursion_ref] All = Name + Number + RecursionLevel Type = :backref end # Type is the same as Backreference so keeping it here, for now. module SubexpressionCall Name = [:name_call] Number = [:number_call, :number_rel_call] All = Name + Number end Map[Backreference::Type] = Backreference::All + SubexpressionCall::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/keep.rb0000644000004100000410000000024313541126475024141 0ustar www-datawww-datamodule Regexp::Syntax module Token module Keep Mark = [:mark] All = Mark Type = :keep end Map[Keep::Type] = Keep::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/posix_class.rb0000644000004100000410000000066013541126475025547 0ustar www-datawww-datamodule Regexp::Syntax module Token module PosixClass Standard = [:alnum, :alpha, :blank, :cntrl, :digit, :graph, :lower, :print, :punct, :space, :upper, :xdigit] Extensions = [:ascii, :word] All = Standard + Extensions Type = :posixclass NonType = :nonposixclass end Map[PosixClass::Type] = PosixClass::All Map[PosixClass::NonType] = PosixClass::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/character_type.rb0000644000004100000410000000056313541126475026217 0ustar www-datawww-datamodule Regexp::Syntax module Token module CharacterType Basic = [] Extended = [:digit, :nondigit, :space, :nonspace, :word, :nonword] Hex = [:hex, :nonhex] Clustered = [:linebreak, :xgrapheme] All = Basic + Extended + Hex + Clustered Type = :type end Map[CharacterType::Type] = CharacterType::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/quantifier.rb0000644000004100000410000000142713541126475025371 0ustar www-datawww-datamodule Regexp::Syntax module Token module Quantifier Greedy = [ :zero_or_one, :zero_or_more, :one_or_more ] Reluctant = [ :zero_or_one_reluctant, :zero_or_more_reluctant, :one_or_more_reluctant ] Possessive = [ :zero_or_one_possessive, :zero_or_more_possessive, :one_or_more_possessive ] Interval = [:interval] IntervalReluctant = [:interval_reluctant] IntervalPossessive = [:interval_possessive] IntervalAll = Interval + IntervalReluctant + IntervalPossessive All = Greedy + Reluctant + Possessive + IntervalAll Type = :quantifier end Map[Quantifier::Type] = Quantifier::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/meta.rb0000644000004100000410000000032113541126475024140 0ustar www-datawww-datamodule Regexp::Syntax module Token module Meta Basic = [:dot] Extended = Basic + [:alternation] All = Extended Type = :meta end Map[Meta::Type] = Meta::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/tokens/anchor.rb0000644000004100000410000000054113541126475024470 0ustar www-datawww-datamodule Regexp::Syntax module Token module Anchor Basic = [:bol, :eol] Extended = Basic + [:word_boundary, :nonword_boundary] String = [:bos, :eos, :eos_ob_eol] MatchStart = [:match_start] All = Extended + String + MatchStart Type = :anchor end Map[Anchor::Type] = Anchor::All end end regexp_parser-1.6.0/lib/regexp_parser/syntax/versions.rb0000644000004100000410000000043413541126476023565 0ustar www-datawww-data# Ruby 1.8.x is no longer a supported runtime, # but its regex features are still recognized. # # Aliases for the latest patch version are provided as 'ruby/n.n', # e.g. 'ruby/1.9' refers to Ruby v1.9.3. Dir[File.expand_path('../versions/*.rb', __FILE__)].sort.each { |f| require f } regexp_parser-1.6.0/lib/regexp_parser/syntax.rb0000644000004100000410000000053413541126475021715 0ustar www-datawww-datarequire File.expand_path('../syntax/tokens', __FILE__) require File.expand_path('../syntax/base', __FILE__) require File.expand_path('../syntax/any', __FILE__) require File.expand_path('../syntax/version_lookup', __FILE__) require File.expand_path('../syntax/versions', __FILE__) module Regexp::Syntax class SyntaxError < StandardError; end end regexp_parser-1.6.0/lib/regexp_parser/token.rb0000644000004100000410000000074713541126476021516 0ustar www-datawww-dataclass Regexp TOKEN_KEYS = [ :type, :token, :text, :ts, :te, :level, :set_level, :conditional_level ].freeze Token = Struct.new(*TOKEN_KEYS) do attr_accessor :previous, :next def offset [ts, te] end def length te - ts end if RUBY_VERSION < '2.0.0' def to_h members.inject({}) do |hash, member| hash[member.to_sym] = self[member] hash end end end end end regexp_parser-1.6.0/lib/regexp_parser/scanner.rb0000644000004100000410000023063313541126475022025 0ustar www-datawww-data# -*- warn-indent:false; -*- # line 1 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" # line 661 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" # THIS IS A GENERATED FILE, DO NOT EDIT DIRECTLY # This file was generated from lib/regexp_parser/scanner/scanner.rl class Regexp::Scanner # General scanner error (catch all) class ScannerError < StandardError; end # Base for all scanner validation errors class ValidationError < StandardError def initialize(reason) super reason end end # Unexpected end of pattern class PrematureEndError < ScannerError def initialize(where = '') super "Premature end of pattern at #{where}" end end # Invalid sequence format. Used for escape sequences, mainly. class InvalidSequenceError < ValidationError def initialize(what = 'sequence', where = '') super "Invalid #{what} at #{where}" end end # Invalid group. Used for named groups. class InvalidGroupError < ValidationError def initialize(what, reason) super "Invalid #{what}, #{reason}." end end # Invalid groupOption. Used for inline options. class InvalidGroupOption < ValidationError def initialize(option, text) super "Invalid group option #{option} in #{text}" end end # Invalid back reference. Used for name a number refs/calls. class InvalidBackrefError < ValidationError def initialize(what, reason) super "Invalid back reference #{what}, #{reason}" end end # The property name was not recognized by the scanner. class UnknownUnicodePropertyError < ValidationError def initialize(name) super "Unknown unicode character property name #{name}" end end # Scans the given regular expression text, or Regexp object and collects the # emitted token into an array that gets returned at the end. If a block is # given, it gets called for each emitted token. # # This method may raise errors if a syntax error is encountered. # -------------------------------------------------------------------------- def self.scan(input_object, &block) new.scan(input_object, &block) end def scan(input_object, &block) self.literal = nil stack = [] if input_object.is_a?(Regexp) input = input_object.source self.free_spacing = (input_object.options & Regexp::EXTENDED != 0) else input = input_object self.free_spacing = false end self.spacing_stack = [{:free_spacing => free_spacing, :depth => 0}] data = input.unpack("c*") if input.is_a?(String) eof = data.length self.tokens = [] self.block = block_given? ? block : nil self.set_depth = 0 self.group_depth = 0 self.conditional_stack = [] # line 98 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" class << self attr_accessor :_re_scanner_trans_keys private :_re_scanner_trans_keys, :_re_scanner_trans_keys= end self._re_scanner_trans_keys = [ 0, 0, -128, -65, -128, -65, -128, -65, -128, -65, -128, -65, -128, -65, 10, 10, 41, 41, 39, 122, 33, 122, 48, 122, 39, 60, 39, 122, 48, 57, 39, 57, 48, 57, 39, 57, 39, 122, 43, 122, 48, 57, 48, 62, 48, 57, 43, 62, 43, 122, 44, 125, 48, 125, 123, 123, 9, 122, 9, 125, 9, 122, -128, -65, -128, -65, 38, 38, 45, 122, 45, 122, 93, 93, 94, 120, 97, 120, 108, 115, 110, 112, 117, 117, 109, 109, 58, 58, 93, 93, 104, 104, 97, 97, 99, 99, 105, 105, 105, 105, 108, 108, 97, 97, 110, 110, 107, 107, 110, 110, 116, 116, 114, 114, 108, 108, 105, 105, 103, 103, 105, 105, 116, 116, 114, 114, 97, 97, 112, 112, 104, 104, 111, 111, 119, 119, 101, 101, 114, 114, 114, 117, 105, 105, 110, 110, 110, 110, 99, 99, 112, 112, 97, 97, 99, 99, 101, 101, 112, 112, 112, 112, 111, 111, 114, 114, 100, 100, 100, 100, 65, 122, 61, 61, 93, 93, 45, 45, 92, 92, 92, 92, 45, 45, 92, 92, 92, 92, 48, 123, 48, 102, 48, 102, 48, 102, 48, 102, 9, 125, 9, 125, 9, 125, 9, 125, 9, 125, 9, 125, 48, 123, 41, 41, 39, 122, 41, 57, 48, 122, -62, 127, -62, -33, -32, -17, -16, -12, 1, 127, 1, 127, 9, 32, 33, 126, 10, 126, 63, 63, 33, 126, 33, 126, 43, 63, 43, 63, 43, 63, 65, 122, 43, 63, 68, 119, 80, 112, -62, 125, -128, -65, -128, -65, -128, -65, 38, 38, 38, 93, 46, 61, 48, 122, 36, 125, 48, 55, 48, 55, 77, 77, 45, 45, 0, 0, 67, 99, 45, 45, 0, 0, 92, 92, 48, 102, 39, 60, 39, 122, 49, 57, 41, 57, 48, 122, 0 ] class << self attr_accessor :_re_scanner_key_spans private :_re_scanner_key_spans, :_re_scanner_key_spans= end self._re_scanner_key_spans = [ 0, 64, 64, 64, 64, 64, 64, 1, 1, 84, 90, 75, 22, 84, 10, 19, 10, 19, 84, 80, 10, 15, 10, 20, 80, 82, 78, 1, 114, 117, 114, 64, 64, 1, 78, 78, 1, 27, 24, 8, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 58, 1, 1, 1, 1, 1, 1, 1, 1, 76, 55, 55, 55, 55, 117, 117, 117, 117, 117, 117, 76, 1, 84, 17, 75, 190, 30, 16, 5, 127, 127, 24, 94, 117, 1, 94, 94, 21, 21, 21, 58, 21, 52, 33, 188, 64, 64, 64, 1, 56, 16, 75, 90, 8, 8, 1, 1, 0, 33, 1, 0, 1, 55, 22, 84, 9, 17, 75 ] class << self attr_accessor :_re_scanner_index_offsets private :_re_scanner_index_offsets, :_re_scanner_index_offsets= end self._re_scanner_index_offsets = [ 0, 0, 65, 130, 195, 260, 325, 390, 392, 394, 479, 570, 646, 669, 754, 765, 785, 796, 816, 901, 982, 993, 1009, 1020, 1041, 1122, 1205, 1284, 1286, 1401, 1519, 1634, 1699, 1764, 1766, 1845, 1924, 1926, 1954, 1979, 1988, 1992, 1994, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020, 2022, 2024, 2026, 2028, 2030, 2032, 2034, 2036, 2038, 2040, 2042, 2044, 2046, 2048, 2050, 2055, 2057, 2059, 2061, 2063, 2065, 2067, 2069, 2071, 2073, 2075, 2077, 2079, 2081, 2083, 2142, 2144, 2146, 2148, 2150, 2152, 2154, 2156, 2158, 2235, 2291, 2347, 2403, 2459, 2577, 2695, 2813, 2931, 3049, 3167, 3244, 3246, 3331, 3349, 3425, 3616, 3647, 3664, 3670, 3798, 3926, 3951, 4046, 4164, 4166, 4261, 4356, 4378, 4400, 4422, 4481, 4503, 4556, 4590, 4779, 4844, 4909, 4974, 4976, 5033, 5050, 5126, 5217, 5226, 5235, 5237, 5239, 5240, 5274, 5276, 5277, 5279, 5335, 5358, 5443, 5453, 5471 ] class << self attr_accessor :_re_scanner_indicies private :_re_scanner_indicies, :_re_scanner_indicies= end self._re_scanner_indicies = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 9, 8, 12, 11, 13, 10, 10, 10, 10, 10, 10, 10, 10, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 10, 10, 10, 10, 10, 10, 10, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 10, 10, 10, 10, 14, 10, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 10, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 10, 10, 15, 13, 10, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 10, 10, 10, 16, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 10, 10, 10, 13, 10, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 10, 10, 10, 16, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 10, 18, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 19, 17, 20, 17, 17, 17, 21, 17, 22, 17, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 17, 17, 17, 17, 17, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 17, 17, 17, 23, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 17, 20, 17, 17, 17, 17, 17, 17, 17, 17, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 17, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 17, 20, 17, 17, 17, 21, 17, 21, 17, 17, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 17, 20, 17, 17, 17, 21, 17, 21, 17, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 17, 17, 17, 17, 17, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 17, 17, 17, 23, 17, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 17, 26, 17, 27, 17, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 17, 17, 17, 20, 17, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 17, 17, 17, 28, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 17, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 17, 17, 17, 17, 20, 17, 29, 30, 30, 30, 30, 30, 30, 30, 30, 30, 17, 26, 17, 26, 17, 17, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 17, 17, 17, 17, 20, 17, 26, 17, 26, 17, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 17, 17, 17, 20, 17, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 17, 17, 17, 28, 17, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 17, 32, 31, 31, 31, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 34, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 34, 31, 35, 36, 37, 37, 37, 37, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 37, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 38, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 37, 37, 37, 37, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 37, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 36, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 39, 36, 37, 37, 37, 37, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 37, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 36, 36, 36, 37, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 36, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 40, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 40, 44, 43, 47, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 46, 47, 48, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 46, 49, 46, 50, 46, 46, 51, 52, 53, 54, 46, 46, 55, 46, 46, 46, 46, 56, 46, 46, 46, 57, 46, 46, 58, 46, 59, 46, 60, 61, 46, 51, 52, 53, 54, 46, 46, 55, 46, 46, 46, 46, 56, 46, 46, 46, 57, 46, 46, 58, 46, 59, 46, 60, 61, 46, 62, 46, 46, 46, 46, 46, 46, 63, 46, 64, 46, 65, 46, 66, 46, 67, 46, 68, 46, 69, 46, 70, 46, 67, 46, 71, 46, 72, 46, 67, 46, 73, 46, 74, 46, 75, 46, 67, 46, 76, 46, 77, 46, 78, 46, 67, 46, 79, 46, 80, 46, 81, 46, 67, 46, 82, 46, 83, 46, 84, 46, 67, 46, 85, 46, 86, 46, 87, 46, 67, 46, 88, 46, 46, 89, 46, 90, 46, 81, 46, 91, 46, 81, 46, 92, 46, 93, 46, 94, 46, 67, 46, 95, 46, 86, 46, 96, 46, 97, 46, 67, 46, 54, 46, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 46, 46, 46, 46, 46, 46, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 46, 99, 46, 100, 46, 101, 36, 103, 102, 105, 102, 106, 36, 108, 107, 110, 107, 111, 111, 111, 111, 111, 111, 111, 111, 111, 111, 36, 36, 36, 36, 36, 36, 36, 111, 111, 111, 111, 111, 111, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 111, 111, 111, 111, 111, 111, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 36, 36, 36, 36, 36, 36, 36, 113, 113, 113, 113, 113, 113, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 113, 113, 113, 113, 113, 113, 36, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 36, 36, 36, 36, 36, 36, 36, 114, 114, 114, 114, 114, 114, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 114, 114, 114, 114, 114, 114, 36, 115, 115, 115, 115, 115, 115, 115, 115, 115, 115, 36, 36, 36, 36, 36, 36, 36, 115, 115, 115, 115, 115, 115, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 115, 115, 115, 115, 115, 36, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 36, 36, 36, 36, 36, 36, 36, 116, 116, 116, 116, 116, 116, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 116, 116, 116, 116, 116, 116, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 117, 117, 117, 117, 117, 117, 117, 117, 117, 117, 36, 36, 36, 36, 36, 36, 36, 117, 117, 117, 117, 117, 117, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 117, 117, 117, 117, 117, 117, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 118, 118, 118, 118, 118, 118, 118, 118, 118, 118, 36, 36, 36, 36, 36, 36, 36, 118, 118, 118, 118, 118, 118, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 118, 118, 118, 118, 118, 118, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 119, 119, 119, 119, 119, 119, 119, 119, 119, 119, 36, 36, 36, 36, 36, 36, 36, 119, 119, 119, 119, 119, 119, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 119, 119, 119, 119, 119, 119, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 36, 36, 36, 36, 36, 36, 36, 120, 120, 120, 120, 120, 120, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 120, 120, 120, 120, 120, 120, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 121, 121, 121, 121, 121, 121, 121, 121, 121, 121, 36, 36, 36, 36, 36, 36, 36, 121, 121, 121, 121, 121, 121, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 121, 121, 121, 121, 121, 121, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 112, 112, 112, 112, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 112, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 115, 36, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 122, 122, 122, 122, 122, 122, 122, 123, 123, 123, 123, 123, 123, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 123, 123, 123, 123, 123, 123, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 122, 36, 122, 125, 124, 126, 124, 124, 124, 124, 124, 124, 124, 124, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 124, 124, 124, 124, 124, 124, 124, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 124, 124, 124, 124, 127, 124, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 124, 125, 124, 124, 124, 124, 124, 124, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 124, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 124, 124, 124, 124, 126, 124, 124, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 124, 124, 124, 124, 129, 124, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 124, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 132, 132, 132, 132, 132, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 133, 133, 133, 133, 133, 133, 133, 133, 134, 134, 134, 134, 134, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 135, 136, 136, 137, 138, 136, 136, 136, 139, 140, 141, 142, 136, 136, 143, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 144, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 145, 146, 31, 147, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 33, 148, 31, 136, 133, 31, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 149, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 131, 149, 132, 132, 132, 132, 132, 149, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 149, 133, 149, 133, 133, 133, 133, 133, 133, 133, 133, 134, 134, 134, 134, 134, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 135, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 133, 150, 135, 135, 135, 135, 135, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 135, 150, 136, 136, 136, 149, 136, 136, 136, 149, 149, 149, 149, 136, 136, 149, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 149, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 149, 149, 149, 149, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 136, 149, 149, 149, 136, 149, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 137, 137, 137, 8, 137, 137, 137, 8, 8, 8, 8, 137, 137, 8, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 8, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 8, 8, 8, 8, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 137, 8, 8, 8, 137, 8, 152, 151, 15, 154, 11, 154, 154, 154, 14, 155, 153, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 13, 154, 156, 15, 13, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 13, 154, 153, 154, 153, 154, 154, 154, 153, 153, 153, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 157, 154, 153, 153, 153, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154, 153, 154, 159, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 158, 159, 158, 161, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 160, 161, 160, 163, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 163, 162, 165, 165, 164, 164, 164, 164, 165, 164, 164, 164, 166, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 165, 164, 164, 164, 164, 164, 164, 164, 165, 164, 164, 164, 164, 167, 164, 164, 164, 167, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 164, 165, 164, 169, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 168, 169, 168, 170, 36, 36, 36, 170, 36, 36, 36, 36, 36, 36, 36, 36, 36, 170, 170, 36, 36, 36, 170, 170, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 170, 36, 36, 36, 170, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 170, 36, 36, 36, 170, 36, 171, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 171, 36, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 172, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 173, 174, 174, 174, 174, 174, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 175, 41, 176, 41, 175, 175, 175, 175, 41, 177, 175, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 175, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 178, 179, 180, 181, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 175, 175, 175, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 182, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 182, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 183, 182, 184, 182, 186, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 185, 187, 185, 190, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 189, 191, 189, 189, 192, 189, 194, 194, 194, 194, 194, 194, 194, 194, 194, 194, 193, 193, 193, 193, 193, 193, 193, 194, 194, 194, 193, 193, 193, 194, 193, 193, 193, 194, 193, 194, 193, 193, 193, 193, 194, 193, 193, 193, 193, 193, 194, 193, 194, 193, 193, 193, 193, 193, 193, 193, 193, 194, 193, 193, 193, 194, 193, 193, 193, 194, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 194, 193, 196, 195, 195, 195, 196, 196, 196, 196, 195, 195, 196, 195, 197, 198, 198, 198, 198, 198, 198, 198, 199, 199, 195, 195, 195, 195, 195, 196, 195, 36, 36, 200, 201, 195, 195, 36, 201, 195, 195, 36, 195, 202, 195, 195, 203, 195, 201, 201, 195, 195, 195, 201, 201, 195, 36, 196, 196, 196, 196, 195, 195, 204, 204, 101, 201, 204, 204, 36, 201, 195, 195, 36, 195, 195, 204, 195, 203, 195, 204, 201, 204, 205, 204, 201, 206, 195, 36, 196, 196, 196, 195, 208, 208, 208, 208, 208, 208, 208, 208, 207, 210, 210, 210, 210, 210, 210, 210, 210, 209, 212, 102, 214, 213, 102, 216, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 107, 217, 107, 219, 218, 107, 110, 107, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 221, 221, 221, 221, 221, 221, 221, 222, 222, 222, 222, 222, 222, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 222, 222, 222, 222, 222, 222, 221, 224, 223, 223, 223, 223, 223, 225, 223, 223, 223, 226, 226, 226, 226, 226, 226, 226, 226, 226, 223, 223, 227, 223, 126, 228, 228, 228, 228, 228, 228, 228, 228, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 228, 228, 228, 228, 228, 228, 228, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 228, 228, 228, 228, 127, 228, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 228, 128, 128, 128, 128, 128, 128, 128, 128, 128, 228, 125, 228, 228, 228, 228, 228, 228, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 228, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 228, 228, 228, 228, 126, 228, 228, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 228, 228, 228, 228, 129, 228, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 129, 228, 0 ] class << self attr_accessor :_re_scanner_trans_targs private :_re_scanner_trans_targs, :_re_scanner_trans_targs= end self._re_scanner_trans_targs = [ 110, 111, 3, 112, 5, 6, 113, 110, 7, 110, 110, 8, 110, 110, 9, 110, 11, 110, 13, 19, 110, 14, 16, 18, 15, 17, 20, 22, 24, 21, 23, 0, 26, 25, 126, 28, 0, 29, 30, 128, 129, 129, 31, 129, 129, 129, 129, 35, 36, 129, 38, 39, 50, 54, 58, 62, 66, 70, 75, 79, 81, 84, 40, 47, 41, 45, 42, 43, 44, 129, 46, 48, 49, 51, 52, 53, 55, 56, 57, 59, 60, 61, 63, 64, 65, 67, 68, 69, 71, 73, 72, 74, 76, 77, 78, 80, 82, 83, 86, 87, 129, 89, 137, 140, 137, 142, 92, 137, 143, 137, 145, 95, 98, 96, 97, 137, 99, 100, 101, 102, 103, 104, 137, 147, 148, 148, 106, 107, 108, 109, 1, 2, 4, 114, 115, 116, 117, 118, 110, 119, 110, 122, 123, 110, 124, 110, 125, 110, 110, 110, 110, 110, 120, 110, 121, 110, 10, 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 12, 110, 110, 127, 27, 130, 131, 132, 129, 133, 134, 135, 129, 129, 129, 129, 32, 129, 129, 33, 129, 129, 129, 34, 37, 85, 136, 136, 137, 137, 138, 138, 137, 88, 137, 91, 137, 137, 94, 105, 137, 139, 137, 137, 137, 141, 137, 90, 137, 144, 146, 137, 93, 137, 137, 137, 148, 149, 150, 151, 152, 148 ] class << self attr_accessor :_re_scanner_trans_actions private :_re_scanner_trans_actions, :_re_scanner_trans_actions= end self._re_scanner_trans_actions = [ 1, 2, 0, 2, 0, 0, 2, 3, 0, 4, 5, 6, 7, 8, 0, 9, 0, 10, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 14, 15, 16, 0, 17, 18, 19, 20, 0, 0, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 24, 0, 25, 0, 0, 26, 0, 27, 0, 0, 0, 0, 0, 28, 0, 0, 0, 0, 0, 0, 29, 0, 30, 31, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 35, 36, 37, 0, 0, 38, 0, 39, 34, 40, 41, 42, 43, 44, 45, 46, 0, 47, 0, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 0, 58, 59, 61, 0, 0, 34, 34, 62, 0, 34, 63, 64, 65, 66, 67, 0, 68, 69, 0, 70, 71, 72, 0, 0, 0, 73, 74, 75, 76, 77, 78, 79, 0, 80, 0, 81, 82, 0, 0, 83, 0, 84, 85, 86, 34, 87, 0, 88, 34, 0, 89, 0, 90, 91, 92, 93, 34, 34, 34, 34, 94 ] class << self attr_accessor :_re_scanner_to_state_actions private :_re_scanner_to_state_actions, :_re_scanner_to_state_actions= end self._re_scanner_to_state_actions = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 60, 60, 0, 0, 0, 0, 0, 0, 60, 60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 0, 0, 0, 0 ] class << self attr_accessor :_re_scanner_from_state_actions private :_re_scanner_from_state_actions, :_re_scanner_from_state_actions= end self._re_scanner_from_state_actions = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 33, 33, 0, 0, 0, 0, 0, 0, 33, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0 ] class << self attr_accessor :_re_scanner_eof_actions private :_re_scanner_eof_actions, :_re_scanner_eof_actions= end self._re_scanner_eof_actions = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 13, 13, 13, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 0, 12, 12, 0, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] class << self attr_accessor :_re_scanner_eof_trans private :_re_scanner_eof_trans, :_re_scanner_eof_trans= end self._re_scanner_eof_trans = [ 0, 1, 1, 1, 1, 1, 1, 8, 11, 11, 11, 11, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 0, 0, 0, 0, 0, 0, 41, 41, 44, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 0, 0, 105, 0, 0, 110, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 125, 125, 125, 125, 0, 150, 150, 150, 150, 151, 151, 150, 150, 152, 154, 154, 159, 161, 163, 165, 169, 0, 0, 0, 183, 183, 183, 183, 186, 189, 0, 0, 208, 210, 212, 212, 212, 216, 216, 216, 216, 221, 0, 229, 229, 229, 229 ] class << self attr_accessor :re_scanner_start end self.re_scanner_start = 110; class << self attr_accessor :re_scanner_first_final end self.re_scanner_first_final = 110; class << self attr_accessor :re_scanner_error end self.re_scanner_error = 0; class << self attr_accessor :re_scanner_en_char_type end self.re_scanner_en_char_type = 127; class << self attr_accessor :re_scanner_en_unicode_property end self.re_scanner_en_unicode_property = 128; class << self attr_accessor :re_scanner_en_character_set end self.re_scanner_en_character_set = 129; class << self attr_accessor :re_scanner_en_set_escape_sequence end self.re_scanner_en_set_escape_sequence = 136; class << self attr_accessor :re_scanner_en_escape_sequence end self.re_scanner_en_escape_sequence = 137; class << self attr_accessor :re_scanner_en_conditional_expression end self.re_scanner_en_conditional_expression = 148; class << self attr_accessor :re_scanner_en_main end self.re_scanner_en_main = 110; # line 753 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" # line 1144 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" begin p ||= 0 pe ||= data.length cs = re_scanner_start top = 0 ts = nil te = nil act = 0 end # line 754 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" # line 1157 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" begin testEof = false _slen, _trans, _keys, _inds, _acts, _nacts = nil _goto_level = 0 _resume = 10 _eof_trans = 15 _again = 20 _test_eof = 30 _out = 40 while true if _goto_level <= 0 if p == pe _goto_level = _test_eof next end if cs == 0 _goto_level = _out next end end if _goto_level <= _resume case _re_scanner_from_state_actions[cs] when 33 then # line 1 "NONE" begin ts = p end # line 1185 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" end _keys = cs << 1 _inds = _re_scanner_index_offsets[cs] _slen = _re_scanner_key_spans[cs] _wide = data[p].ord _trans = if ( _slen > 0 && _re_scanner_trans_keys[_keys] <= _wide && _wide <= _re_scanner_trans_keys[_keys + 1] ) then _re_scanner_indicies[ _inds + _wide - _re_scanner_trans_keys[_keys] ] else _re_scanner_indicies[ _inds + _slen ] end end if _goto_level <= _eof_trans cs = _re_scanner_trans_targs[_trans] if _re_scanner_trans_actions[_trans] != 0 case _re_scanner_trans_actions[_trans] when 12 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end when 36 then # line 143 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth + 1 end when 6 then # line 144 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth - 1 end when 34 then # line 1 "NONE" begin te = p+1 end when 61 then # line 12 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/char_type.rl" begin te = p+1 begin case text = text(data, ts, te, 1).first when '\d'; emit(:type, :digit, text, ts - 1, te) when '\D'; emit(:type, :nondigit, text, ts - 1, te) when '\h'; emit(:type, :hex, text, ts - 1, te) when '\H'; emit(:type, :nonhex, text, ts - 1, te) when '\s'; emit(:type, :space, text, ts - 1, te) when '\S'; emit(:type, :nonspace, text, ts - 1, te) when '\w'; emit(:type, :word, text, ts - 1, te) when '\W'; emit(:type, :nonword, text, ts - 1, te) when '\R'; emit(:type, :linebreak, text, ts - 1, te) when '\X'; emit(:type, :xgrapheme, text, ts - 1, te) end begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 14 then # line 16 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/property.rl" begin te = p+1 begin text = text(data, ts, te, 1).first type = (text[1] == 'P') ^ (text[3] == '^') ? :nonproperty : :property name = data[ts+2..te-2].pack('c*').gsub(/[\^\s_\-]/, '').downcase token = self.class.short_prop_map[name] || self.class.long_prop_map[name] raise UnknownUnicodePropertyError.new(name) unless token self.emit(type, token.to_sym, text, ts-1, te) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 18 then # line 171 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin # special case, emits two tokens emit(:literal, :literal, '-', ts, te) emit(:set, :intersection, '&&', ts, te) end end when 66 then # line 176 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te).first if tokens.last[1] == :open emit(:set, :negate, text, ts, te) else emit(:literal, :literal, text, ts, te) end end end when 68 then # line 197 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:set, :intersection, *text(data, ts, te)) end end when 64 then # line 201 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin begin stack[top] = cs top+= 1 cs = 136 _goto_level = _again next end end end when 62 then # line 231 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:literal, :literal, *text(data, ts, te)) end end when 16 then # line 239 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin char, *rest = *text(data, ts, te) char.force_encoding('utf-8') if char.respond_to?(:force_encoding) emit(:literal, :literal, char, *rest) end end when 69 then # line 185 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin text = text(data, ts, te).first # ranges cant start with a subset or intersection/negation/range operator if tokens.last[0] == :set emit(:literal, :literal, text, ts, te) else emit(:set, :range, text, ts, te) end end end when 72 then # line 205 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:set, :open, *text(data, ts, te)) begin stack[top] = cs top+= 1 cs = 129 _goto_level = _again next end end end when 67 then # line 239 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin char, *rest = *text(data, ts, te) char.force_encoding('utf-8') if char.respond_to?(:force_encoding) emit(:literal, :literal, char, *rest) end end when 17 then # line 185 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin text = text(data, ts, te).first # ranges cant start with a subset or intersection/negation/range operator if tokens.last[0] == :set emit(:literal, :literal, text, ts, te) else emit(:set, :range, text, ts, te) end end end when 20 then # line 205 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin emit(:set, :open, *text(data, ts, te)) begin stack[top] = cs top+= 1 cs = 129 _goto_level = _again next end end end when 15 then # line 239 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin char, *rest = *text(data, ts, te) char.force_encoding('utf-8') if char.respond_to?(:force_encoding) emit(:literal, :literal, char, *rest) end end when 74 then # line 249 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:escape, :literal, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 73 then # line 254 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin p = p - 1; cs = 129; begin stack[top] = cs top+= 1 cs = 137 _goto_level = _again next end end end when 79 then # line 265 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te, 1).first emit(:backref, :number, text, ts-1, te) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 85 then # line 271 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:escape, :octal, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 76 then # line 276 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te, 1).first when '\.'; emit(:escape, :dot, text, ts-1, te) when '\|'; emit(:escape, :alternation, text, ts-1, te) when '\^'; emit(:escape, :bol, text, ts-1, te) when '\$'; emit(:escape, :eol, text, ts-1, te) when '\?'; emit(:escape, :zero_or_one, text, ts-1, te) when '\*'; emit(:escape, :zero_or_more, text, ts-1, te) when '\+'; emit(:escape, :one_or_more, text, ts-1, te) when '\('; emit(:escape, :group_open, text, ts-1, te) when '\)'; emit(:escape, :group_close, text, ts-1, te) when '\{'; emit(:escape, :interval_open, text, ts-1, te) when '\}'; emit(:escape, :interval_close, text, ts-1, te) when '\['; emit(:escape, :set_open, text, ts-1, te) when '\]'; emit(:escape, :set_close, text, ts-1, te) when "\\\\"; emit(:escape, :backslash, text, ts-1, te) end begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 82 then # line 297 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin # \b is emitted as backspace only when inside a character set, otherwise # it is a word boundary anchor. A syntax might "normalize" it if needed. case text = text(data, ts, te, 1).first when '\a'; emit(:escape, :bell, text, ts-1, te) when '\b'; emit(:escape, :backspace, text, ts-1, te) when '\e'; emit(:escape, :escape, text, ts-1, te) when '\f'; emit(:escape, :form_feed, text, ts-1, te) when '\n'; emit(:escape, :newline, text, ts-1, te) when '\r'; emit(:escape, :carriage, text, ts-1, te) when '\t'; emit(:escape, :tab, text, ts-1, te) when '\v'; emit(:escape, :vertical_tab, text, ts-1, te) end begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 28 then # line 313 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te, 1).first if text[2].chr == '{' emit(:escape, :codepoint_list, text, ts-1, te) else emit(:escape, :codepoint, text, ts-1, te) end begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 92 then # line 323 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:escape, :hex, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 24 then # line 332 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit_meta_control_sequence(data, ts, te, :control) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 26 then # line 337 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit_meta_control_sequence(data, ts, te, :meta_sequence) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 80 then # line 342 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin p = p - 1; cs = ((in_set? ? 129 : 110)); begin stack[top] = cs top+= 1 cs = 127 _goto_level = _again next end end end when 81 then # line 348 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin p = p - 1; cs = ((in_set? ? 129 : 110)); begin stack[top] = cs top+= 1 cs = 128 _goto_level = _again next end end end when 75 then # line 354 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:escape, :literal, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 84 then # line 271 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:escape, :octal, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 91 then # line 323 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:escape, :hex, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 87 then # line 332 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit_meta_control_sequence(data, ts, te, :control) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 89 then # line 337 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit_meta_control_sequence(data, ts, te, :meta_sequence) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 83 then # line 1 "NONE" begin case act when 18 then begin begin p = ((te))-1; end text = text(data, ts, te, 1).first emit(:backref, :number, text, ts-1, te) begin top -= 1 cs = stack[top] _goto_level = _again next end end when 19 then begin begin p = ((te))-1; end emit(:escape, :octal, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end end when 31 then # line 364 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te-1).first emit(:conditional, :condition, text, ts, te-1) emit(:conditional, :condition_close, ')', te-1, te) end end when 93 then # line 370 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin p = p - 1; begin stack[top] = cs top+= 1 cs = 110 _goto_level = _again next end end end when 94 then # line 370 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin p = p - 1; begin stack[top] = cs top+= 1 cs = 110 _goto_level = _again next end end end when 30 then # line 370 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin p = p - 1; begin stack[top] = cs top+= 1 cs = 110 _goto_level = _again next end end end when 38 then # line 383 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:meta, :dot, *text(data, ts, te)) end end when 41 then # line 387 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin if conditional_stack.last == group_depth emit(:conditional, :separator, *text(data, ts, te)) else emit(:meta, :alternation, *text(data, ts, te)) end end end when 40 then # line 397 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:anchor, :bol, *text(data, ts, te)) end end when 35 then # line 401 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:anchor, :eol, *text(data, ts, te)) end end when 57 then # line 405 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:keep, :mark, *text(data, ts, te)) end end when 56 then # line 409 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when '\\A'; emit(:anchor, :bos, text, ts, te) when '\\z'; emit(:anchor, :eos, text, ts, te) when '\\Z'; emit(:anchor, :eos_ob_eol, text, ts, te) when '\\b'; emit(:anchor, :word_boundary, text, ts, te) when '\\B'; emit(:anchor, :nonword_boundary, text, ts, te) when '\\G'; emit(:anchor, :match_start, text, ts, te) end end end when 47 then # line 431 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te).first conditional_stack << group_depth emit(:conditional, :open, text[0..-2], ts, te-1) emit(:conditional, :condition_open, '(', te-1, te) begin stack[top] = cs top+= 1 cs = 148 _goto_level = _again next end end end when 48 then # line 462 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te).first if text[2..-1] =~ /([^\-mixdau:]|^$)|-.*([dau])/ raise InvalidGroupOption.new($1 || "-#{$2}", text) end emit_options(text, ts, te) end end when 9 then # line 476 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when '(?='; emit(:assertion, :lookahead, text, ts, te) when '(?!'; emit(:assertion, :nlookahead, text, ts, te) when '(?<='; emit(:assertion, :lookbehind, text, ts, te) when '(?'; emit(:group, :atomic, text, ts, te) when '(?~'; emit(:group, :absence, text, ts, te) when /^\(\?(?:<>|'')/ validation_error(:group, 'named group', 'name is empty') when /^\(\?<\w*>/ emit(:group, :named_ab, text, ts, te) when /^\(\?'\w*'/ emit(:group, :named_sq, text, ts, te) end end end when 11 then # line 534 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when /^\\([gk])(<>|'')/ # angle brackets validation_error(:backref, 'ref/call', 'ref ID is empty') when /^\\([gk])<[^\d+-]\w*>/ # angle-brackets if $1 == 'k' emit(:backref, :name_ref_ab, text, ts, te) else emit(:backref, :name_call_ab, text, ts, te) end when /^\\([gk])'[^\d+-]\w*'/ #single quotes if $1 == 'k' emit(:backref, :name_ref_sq, text, ts, te) else emit(:backref, :name_call_sq, text, ts, te) end when /^\\([gk])<\d+>/ # angle-brackets if $1 == 'k' emit(:backref, :number_ref_ab, text, ts, te) else emit(:backref, :number_call_ab, text, ts, te) end when /^\\([gk])'\d+'/ # single quotes if $1 == 'k' emit(:backref, :number_ref_sq, text, ts, te) else emit(:backref, :number_call_sq, text, ts, te) end when /^\\(?:g<\+|g<-|(k)<-)\d+>/ # angle-brackets if $1 == 'k' emit(:backref, :number_rel_ref_ab, text, ts, te) else emit(:backref, :number_rel_call_ab, text, ts, te) end when /^\\(?:g'\+|g'-|(k)'-)\d+'/ # single quotes if $1 == 'k' emit(:backref, :number_rel_ref_sq, text, ts, te) else emit(:backref, :number_rel_call_sq, text, ts, te) end when /^\\k<[^\d+\-]\w*[+\-]\d+>/ # angle-brackets emit(:backref, :name_recursion_ref_ab, text, ts, te) when /^\\k'[^\d+\-]\w*[+\-]\d+'/ # single-quotes emit(:backref, :name_recursion_ref_sq, text, ts, te) when /^\\([gk])<[+\-]?\d+[+\-]\d+>/ # angle-brackets emit(:backref, :number_recursion_ref_ab, text, ts, te) when /^\\([gk])'[+\-]?\d+[+\-]\d+'/ # single-quotes emit(:backref, :number_recursion_ref_sq, text, ts, te) end end end when 54 then # line 599 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when '?' ; emit(:quantifier, :zero_or_one, text, ts, te) when '??'; emit(:quantifier, :zero_or_one_reluctant, text, ts, te) when '?+'; emit(:quantifier, :zero_or_one_possessive, text, ts, te) end end end when 50 then # line 607 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when '*' ; emit(:quantifier, :zero_or_more, text, ts, te) when '*?'; emit(:quantifier, :zero_or_more_reluctant, text, ts, te) when '*+'; emit(:quantifier, :zero_or_more_possessive, text, ts, te) end end end when 52 then # line 615 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin case text = text(data, ts, te).first when '+' ; emit(:quantifier, :one_or_more, text, ts, te) when '+?'; emit(:quantifier, :one_or_more_reluctant, text, ts, te) when '++'; emit(:quantifier, :one_or_more_possessive, text, ts, te) end end end when 59 then # line 623 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:quantifier, :interval, *text(data, ts, te)) end end when 4 then # line 633 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin if free_spacing emit(:free_space, :comment, *text(data, ts, te)) else append_literal(data, ts, te) end end end when 46 then # line 462 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin text = text(data, ts, te).first if text[2..-1] =~ /([^\-mixdau:]|^$)|-.*([dau])/ raise InvalidGroupOption.new($1 || "-#{$2}", text) end emit_options(text, ts, te) end end when 44 then # line 511 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin text = text(data, ts, te).first emit(:group, :capture, text, ts, te) end end when 53 then # line 599 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin case text = text(data, ts, te).first when '?' ; emit(:quantifier, :zero_or_one, text, ts, te) when '??'; emit(:quantifier, :zero_or_one_reluctant, text, ts, te) when '?+'; emit(:quantifier, :zero_or_one_possessive, text, ts, te) end end end when 49 then # line 607 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin case text = text(data, ts, te).first when '*' ; emit(:quantifier, :zero_or_more, text, ts, te) when '*?'; emit(:quantifier, :zero_or_more_reluctant, text, ts, te) when '*+'; emit(:quantifier, :zero_or_more_possessive, text, ts, te) end end end when 51 then # line 615 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin case text = text(data, ts, te).first when '+' ; emit(:quantifier, :one_or_more, text, ts, te) when '+?'; emit(:quantifier, :one_or_more_reluctant, text, ts, te) when '++'; emit(:quantifier, :one_or_more_possessive, text, ts, te) end end end when 58 then # line 623 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:quantifier, :interval, *text(data, ts, te)) end end when 55 then # line 629 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin begin stack[top] = cs top+= 1 cs = 137 _goto_level = _again next end end end when 43 then # line 641 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin if free_spacing emit(:free_space, :whitespace, *text(data, ts, te)) else append_literal(data, ts, te) end end end when 42 then # line 656 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin append_literal(data, ts, te) end end when 5 then # line 462 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin text = text(data, ts, te).first if text[2..-1] =~ /([^\-mixdau:]|^$)|-.*([dau])/ raise InvalidGroupOption.new($1 || "-#{$2}", text) end emit_options(text, ts, te) end end when 10 then # line 629 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin begin stack[top] = cs top+= 1 cs = 137 _goto_level = _again next end end end when 3 then # line 656 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin append_literal(data, ts, te) end end when 1 then # line 1 "NONE" begin case act when 0 then begin begin cs = 0 _goto_level = _again next end end when 54 then begin begin p = ((te))-1; end append_literal(data, ts, te) end end end when 71 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 205 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:set, :open, *text(data, ts, te)) begin stack[top] = cs top+= 1 cs = 129 _goto_level = _again next end end end when 19 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 205 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin emit(:set, :open, *text(data, ts, te)) begin stack[top] = cs top+= 1 cs = 129 _goto_level = _again next end end end when 90 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 323 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit(:escape, :hex, *text(data, ts, te, 1)) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 86 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 332 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit_meta_control_sequence(data, ts, te, :control) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 88 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 337 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p p = p - 1; begin emit_meta_control_sequence(data, ts, te, :meta_sequence) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 25 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 332 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin emit_meta_control_sequence(data, ts, te, :control) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 27 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 337 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin begin p = ((te))-1; end begin emit_meta_control_sequence(data, ts, te, :meta_sequence) begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 29 then # line 137 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') validation_error(:sequence, 'sequence', text) end # line 328 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin begin top -= 1 cs = stack[top] _goto_level = _again next end end end when 7 then # line 144 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth - 1 end # line 447 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:group, :comment, *text(data, ts, te)) end end when 37 then # line 144 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth - 1 end # line 516 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin if conditional_stack.last == group_depth + 1 conditional_stack.pop emit(:conditional, :close, *text(data, ts, te)) else if spacing_stack.length > 1 && spacing_stack.last[:depth] == group_depth + 1 spacing_stack.pop self.free_spacing = spacing_stack.last[:free_spacing] end emit(:group, :close, *text(data, ts, te)) end end end when 39 then # line 145 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth + 1 end # line 422 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:set, :open, *text(data, ts, te)) begin stack[top] = cs top+= 1 cs = 129 _goto_level = _again next end end end when 65 then # line 146 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth - 1 end # line 152 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:set, :close, *text(data, ts, te)) if in_set? begin top -= 1 cs = stack[top] _goto_level = _again next end else begin cs = 110 _goto_level = _again next end end end end when 70 then # line 146 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth - 1 end # line 161 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin # special case, emits two tokens emit(:literal, :literal, copy(data, ts..te-2), ts, te - 1) emit(:set, :close, copy(data, ts+1..te-1), ts + 1, te) if in_set? begin top -= 1 cs = stack[top] _goto_level = _again next end else begin cs = 110 _goto_level = _again next end end end end when 22 then # line 146 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth - 1 end # line 210 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin text = text(data, ts, te).first type = :posixclass class_name = text[2..-3] if class_name[0].chr == '^' class_name = class_name[1..-1] type = :nonposixclass end emit(type, class_name.to_sym, text, ts, te) end end when 21 then # line 146 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth - 1 end # line 223 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:set, :collation, *text(data, ts, te)) end end when 23 then # line 146 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth - 1 end # line 227 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin te = p+1 begin emit(:set, :equivalent, *text(data, ts, te)) end end when 63 then # line 1 "NONE" begin te = p+1 end # line 145 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.set_depth = set_depth + 1 end when 78 then # line 1 "NONE" begin te = p+1 end # line 265 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin act = 18; end when 77 then # line 1 "NONE" begin te = p+1 end # line 271 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin act = 19; end when 2 then # line 1 "NONE" begin te = p+1 end # line 656 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin act = 54; end when 45 then # line 1 "NONE" begin te = p+1 end # line 144 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth - 1 end # line 143 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin self.group_depth = group_depth + 1 end # line 2563 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" end end end if _goto_level <= _again case _re_scanner_to_state_actions[cs] when 60 then # line 1 "NONE" begin ts = nil; end when 32 then # line 1 "NONE" begin ts = nil; end # line 1 "NONE" begin act = 0 end # line 2581 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" end if cs == 0 _goto_level = _out next end p += 1 if p != pe _goto_level = _resume next end end if _goto_level <= _test_eof if p == eof if _re_scanner_eof_trans[cs] > 0 _trans = _re_scanner_eof_trans[cs] - 1; _goto_level = _eof_trans next; end case _re_scanner_eof_actions[cs] when 13 then # line 8 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/property.rl" begin raise PrematureEndError.new('unicode property') end when 12 then # line 131 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" begin text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise PrematureEndError.new( text ) end # line 2615 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner.rb" end end end if _goto_level <= _out break end end end # line 755 "/Users/jannoschmuller/code/regexp_parser/lib/regexp_parser/scanner/scanner.rl" # to avoid "warning: assigned but unused variable - testEof" testEof = testEof if cs == re_scanner_error text = ts ? copy(data, ts-1..-1) : data.pack('c*') raise ScannerError.new("Scan error at '#{text}'") end raise PrematureEndError.new("(missing group closing paranthesis) "+ "[#{group_depth}]") if in_group? raise PrematureEndError.new("(missing set closing bracket) "+ "[#{set_depth}]") if in_set? # when the entire expression is a literal run emit_literal if literal tokens end # lazy-load property maps when first needed require 'yaml' PROP_MAPS_DIR = File.expand_path('../scanner/properties', __FILE__) def self.short_prop_map @short_prop_map ||= YAML.load_file("#{PROP_MAPS_DIR}/short.yml") end def self.long_prop_map @long_prop_map ||= YAML.load_file("#{PROP_MAPS_DIR}/long.yml") end # Emits an array with the details of the scanned pattern def emit(type, token, text, ts, te) #puts "EMIT: type: #{type}, token: #{token}, text: #{text}, ts: #{ts}, te: #{te}" emit_literal if literal if block block.call type, token, text, ts, te end tokens << [type, token, text, ts, te] end private attr_accessor :tokens, :literal, :block, :free_spacing, :spacing_stack, :group_depth, :set_depth, :conditional_stack def in_group? group_depth > 0 end def in_set? set_depth > 0 end # Copy from ts to te from data as text def copy(data, range) data[range].pack('c*') end # Copy from ts to te from data as text, returning an array with the text # and the offsets used to copy it. def text(data, ts, te, soff = 0) [copy(data, ts-soff..te-1), ts-soff, te] end # Appends one or more characters to the literal buffer, to be emitted later # by a call to emit_literal. Contents can be a mix of ASCII and UTF-8. def append_literal(data, ts, te) self.literal = literal || [] literal << text(data, ts, te) end # Emits the literal run collected by calls to the append_literal method, # using the total start (ts) and end (te) offsets of the run. def emit_literal ts, te = literal.first[1], literal.last[2] text = literal.map {|t| t[0]}.join text.force_encoding('utf-8') if text.respond_to?(:force_encoding) self.literal = nil emit(:literal, :literal, text, ts, te) end def emit_options(text, ts, te) token = nil # Ruby allows things like '(?-xxxx)' or '(?xx-xx--xx-:abc)'. text =~ /\(\?([mixdau]*)(-(?:[mix]*))*(:)?/ positive, negative, group_local = $1, $2, $3 if positive.include?('x') self.free_spacing = true end # If the x appears in both, treat it like ruby does, the second cancels # the first. if negative && negative.include?('x') self.free_spacing = false end if group_local spacing_stack << {:free_spacing => free_spacing, :depth => group_depth} token = :options else # switch for parent group level spacing_stack.last[:free_spacing] = free_spacing token = :options_switch end emit(:group, token, text, ts, te) end def emit_meta_control_sequence(data, ts, te, token) if data.last < 0x00 || data.last > 0x7F validation_error(:sequence, 'escape', token.to_s) end emit(:escape, token, *text(data, ts, te, 1)) end # Centralizes and unifies the handling of validation related # errors. def validation_error(type, what, reason) case type when :group error = InvalidGroupError.new(what, reason) when :backref error = InvalidBackrefError.new(what, reason) when :sequence error = InvalidSequenceError.new(what, reason) end raise error # unless @@config.validation_ignore end end # module Regexp::Scanner regexp_parser-1.6.0/lib/regexp_parser/parser.rb0000644000004100000410000005027413541126475021671 0ustar www-datawww-datarequire 'regexp_parser/expression' class Regexp::Parser include Regexp::Expression include Regexp::Syntax class ParserError < StandardError; end class UnknownTokenTypeError < ParserError def initialize(type, token) super "Unknown token type #{type} #{token.inspect}" end end class UnknownTokenError < ParserError def initialize(type, token) super "Unknown #{type} token #{token.token}" end end def self.parse(input, syntax = "ruby/#{RUBY_VERSION}", &block) new.parse(input, syntax, &block) end def parse(input, syntax = "ruby/#{RUBY_VERSION}", &block) root = Root.build(options_from_input(input)) self.root = root self.node = root self.nesting = [root] self.options_stack = [root.options] self.switching_options = false self.conditional_nesting = [] self.captured_group_counts = Hash.new(0) Regexp::Lexer.scan(input, syntax) do |token| parse_token(token) end assign_referenced_expressions if block_given? block.call(root) else root end end private attr_accessor :root, :node, :nesting, :options_stack, :switching_options, :conditional_nesting, :captured_group_counts def options_from_input(input) return {} unless input.is_a?(::Regexp) options = {} options[:i] = true if input.options & ::Regexp::IGNORECASE != 0 options[:m] = true if input.options & ::Regexp::MULTILINE != 0 options[:x] = true if input.options & ::Regexp::EXTENDED != 0 options end def nest(exp) nesting.push(exp) node << exp update_transplanted_subtree(exp, node) self.node = exp end # subtrees are transplanted to build Alternations, Intersections, Ranges def update_transplanted_subtree(exp, new_parent) exp.nesting_level = new_parent.nesting_level + 1 exp.respond_to?(:each) && exp.each { |subexp| update_transplanted_subtree(subexp, exp) } end def decrease_nesting while nesting.last.is_a?(SequenceOperation) nesting.pop self.node = nesting.last end nesting.pop yield(node) if block_given? self.node = nesting.last self.node = node.last if node.last.is_a?(SequenceOperation) end def nest_conditional(exp) conditional_nesting.push(exp) nest(exp) end def parse_token(token) close_completed_character_set_range case token.type when :meta; meta(token) when :quantifier; quantifier(token) when :anchor; anchor(token) when :escape; escape(token) when :group; group(token) when :assertion; group(token) when :set; set(token) when :type; type(token) when :backref; backref(token) when :conditional; conditional(token) when :keep; keep(token) when :posixclass, :nonposixclass posixclass(token) when :property, :nonproperty property(token) when :literal node << Literal.new(token, active_opts) when :free_space free_space(token) else raise UnknownTokenTypeError.new(token.type, token) end end def set(token) case token.token when :open open_set(token) when :close close_set when :negate negate_set when :range range(token) when :intersection intersection(token) when :collation, :equivalent node << Literal.new(token, active_opts) else raise UnknownTokenError.new('CharacterSet', token) end end def meta(token) case token.token when :dot node << CharacterType::Any.new(token, active_opts) when :alternation sequence_operation(Alternation, token) else raise UnknownTokenError.new('Meta', token) end end def backref(token) case token.token when :name_ref node << Backreference::Name.new(token, active_opts) when :name_recursion_ref node << Backreference::NameRecursionLevel.new(token, active_opts) when :name_call node << Backreference::NameCall.new(token, active_opts) when :number, :number_ref node << Backreference::Number.new(token, active_opts) when :number_recursion_ref node << Backreference::NumberRecursionLevel.new(token, active_opts) when :number_call node << Backreference::NumberCall.new(token, active_opts) when :number_rel_ref node << Backreference::NumberRelative.new(token, active_opts).tap do |exp| assign_effective_number(exp) end when :number_rel_call node << Backreference::NumberCallRelative.new(token, active_opts).tap do |exp| assign_effective_number(exp) end else raise UnknownTokenError.new('Backreference', token) end end def type(token) case token.token when :digit node << CharacterType::Digit.new(token, active_opts) when :nondigit node << CharacterType::NonDigit.new(token, active_opts) when :hex node << CharacterType::Hex.new(token, active_opts) when :nonhex node << CharacterType::NonHex.new(token, active_opts) when :space node << CharacterType::Space.new(token, active_opts) when :nonspace node << CharacterType::NonSpace.new(token, active_opts) when :word node << CharacterType::Word.new(token, active_opts) when :nonword node << CharacterType::NonWord.new(token, active_opts) when :linebreak node << CharacterType::Linebreak.new(token, active_opts) when :xgrapheme node << CharacterType::ExtendedGrapheme.new(token, active_opts) else raise UnknownTokenError.new('CharacterType', token) end end def conditional(token) case token.token when :open nest_conditional(Conditional::Expression.new(token, active_opts)) when :condition conditional_nesting.last.condition = Conditional::Condition.new(token, active_opts) conditional_nesting.last.add_sequence(active_opts) when :separator conditional_nesting.last.add_sequence(active_opts) self.node = conditional_nesting.last.branches.last when :close conditional_nesting.pop decrease_nesting self.node = if conditional_nesting.empty? nesting.last else conditional_nesting.last end else raise UnknownTokenError.new('Conditional', token) end end def posixclass(token) node << PosixClass.new(token, active_opts) end include Regexp::Expression::UnicodeProperty def property(token) case token.token when :alnum; node << Alnum.new(token, active_opts) when :alpha; node << Alpha.new(token, active_opts) when :ascii; node << Ascii.new(token, active_opts) when :blank; node << Blank.new(token, active_opts) when :cntrl; node << Cntrl.new(token, active_opts) when :digit; node << Digit.new(token, active_opts) when :graph; node << Graph.new(token, active_opts) when :lower; node << Lower.new(token, active_opts) when :print; node << Print.new(token, active_opts) when :punct; node << Punct.new(token, active_opts) when :space; node << Space.new(token, active_opts) when :upper; node << Upper.new(token, active_opts) when :word; node << Word.new(token, active_opts) when :xdigit; node << Xdigit.new(token, active_opts) when :xposixpunct; node << XPosixPunct.new(token, active_opts) # only in Oniguruma (old rubies) when :newline; node << Newline.new(token, active_opts) when :any; node << Any.new(token, active_opts) when :assigned; node << Assigned.new(token, active_opts) when :letter; node << Letter::Any.new(token, active_opts) when :cased_letter; node << Letter::Cased.new(token, active_opts) when :uppercase_letter; node << Letter::Uppercase.new(token, active_opts) when :lowercase_letter; node << Letter::Lowercase.new(token, active_opts) when :titlecase_letter; node << Letter::Titlecase.new(token, active_opts) when :modifier_letter; node << Letter::Modifier.new(token, active_opts) when :other_letter; node << Letter::Other.new(token, active_opts) when :mark; node << Mark::Any.new(token, active_opts) when :combining_mark; node << Mark::Combining.new(token, active_opts) when :nonspacing_mark; node << Mark::Nonspacing.new(token, active_opts) when :spacing_mark; node << Mark::Spacing.new(token, active_opts) when :enclosing_mark; node << Mark::Enclosing.new(token, active_opts) when :number; node << Number::Any.new(token, active_opts) when :decimal_number; node << Number::Decimal.new(token, active_opts) when :letter_number; node << Number::Letter.new(token, active_opts) when :other_number; node << Number::Other.new(token, active_opts) when :punctuation; node << Punctuation::Any.new(token, active_opts) when :connector_punctuation; node << Punctuation::Connector.new(token, active_opts) when :dash_punctuation; node << Punctuation::Dash.new(token, active_opts) when :open_punctuation; node << Punctuation::Open.new(token, active_opts) when :close_punctuation; node << Punctuation::Close.new(token, active_opts) when :initial_punctuation; node << Punctuation::Initial.new(token, active_opts) when :final_punctuation; node << Punctuation::Final.new(token, active_opts) when :other_punctuation; node << Punctuation::Other.new(token, active_opts) when :separator; node << Separator::Any.new(token, active_opts) when :space_separator; node << Separator::Space.new(token, active_opts) when :line_separator; node << Separator::Line.new(token, active_opts) when :paragraph_separator; node << Separator::Paragraph.new(token, active_opts) when :symbol; node << Symbol::Any.new(token, active_opts) when :math_symbol; node << Symbol::Math.new(token, active_opts) when :currency_symbol; node << Symbol::Currency.new(token, active_opts) when :modifier_symbol; node << Symbol::Modifier.new(token, active_opts) when :other_symbol; node << Symbol::Other.new(token, active_opts) when :other; node << Codepoint::Any.new(token, active_opts) when :control; node << Codepoint::Control.new(token, active_opts) when :format; node << Codepoint::Format.new(token, active_opts) when :surrogate; node << Codepoint::Surrogate.new(token, active_opts) when :private_use; node << Codepoint::PrivateUse.new(token, active_opts) when :unassigned; node << Codepoint::Unassigned.new(token, active_opts) when *Token::UnicodeProperty::Age node << Age.new(token, active_opts) when *Token::UnicodeProperty::Derived node << Derived.new(token, active_opts) when *Token::UnicodeProperty::Emoji node << Emoji.new(token, active_opts) when *Token::UnicodeProperty::Script node << Script.new(token, active_opts) when *Token::UnicodeProperty::UnicodeBlock node << Block.new(token, active_opts) else raise UnknownTokenError.new('UnicodeProperty', token) end end def anchor(token) case token.token when :bol node << Anchor::BeginningOfLine.new(token, active_opts) when :eol node << Anchor::EndOfLine.new(token, active_opts) when :bos node << Anchor::BOS.new(token, active_opts) when :eos node << Anchor::EOS.new(token, active_opts) when :eos_ob_eol node << Anchor::EOSobEOL.new(token, active_opts) when :word_boundary node << Anchor::WordBoundary.new(token, active_opts) when :nonword_boundary node << Anchor::NonWordBoundary.new(token, active_opts) when :match_start node << Anchor::MatchStart.new(token, active_opts) else raise UnknownTokenError.new('Anchor', token) end end def escape(token) case token.token when :backspace node << EscapeSequence::Backspace.new(token, active_opts) when :escape node << EscapeSequence::AsciiEscape.new(token, active_opts) when :bell node << EscapeSequence::Bell.new(token, active_opts) when :form_feed node << EscapeSequence::FormFeed.new(token, active_opts) when :newline node << EscapeSequence::Newline.new(token, active_opts) when :carriage node << EscapeSequence::Return.new(token, active_opts) when :tab node << EscapeSequence::Tab.new(token, active_opts) when :vertical_tab node << EscapeSequence::VerticalTab.new(token, active_opts) when :hex node << EscapeSequence::Hex.new(token, active_opts) when :octal node << EscapeSequence::Octal.new(token, active_opts) when :codepoint node << EscapeSequence::Codepoint.new(token, active_opts) when :codepoint_list node << EscapeSequence::CodepointList.new(token, active_opts) when :control if token.text =~ /\A(?:\\C-\\M|\\c\\M)/ node << EscapeSequence::MetaControl.new(token, active_opts) else node << EscapeSequence::Control.new(token, active_opts) end when :meta_sequence if token.text =~ /\A\\M-\\[Cc]/ node << EscapeSequence::MetaControl.new(token, active_opts) else node << EscapeSequence::Meta.new(token, active_opts) end else # treating everything else as a literal node << EscapeSequence::Literal.new(token, active_opts) end end def keep(token) node << Keep::Mark.new(token, active_opts) end def free_space(token) case token.token when :comment node << Comment.new(token, active_opts) when :whitespace if node.last.is_a?(WhiteSpace) node.last.merge(WhiteSpace.new(token, active_opts)) else node << WhiteSpace.new(token, active_opts) end else raise UnknownTokenError.new('FreeSpace', token) end end def quantifier(token) offset = -1 target_node = node.expressions[offset] while target_node.is_a?(FreeSpace) target_node = node.expressions[offset -= 1] end target_node || raise(ArgumentError, 'No valid target found for '\ "'#{token.text}' ") case token.token when :zero_or_one target_node.quantify(:zero_or_one, token.text, 0, 1, :greedy) when :zero_or_one_reluctant target_node.quantify(:zero_or_one, token.text, 0, 1, :reluctant) when :zero_or_one_possessive target_node.quantify(:zero_or_one, token.text, 0, 1, :possessive) when :zero_or_more target_node.quantify(:zero_or_more, token.text, 0, -1, :greedy) when :zero_or_more_reluctant target_node.quantify(:zero_or_more, token.text, 0, -1, :reluctant) when :zero_or_more_possessive target_node.quantify(:zero_or_more, token.text, 0, -1, :possessive) when :one_or_more target_node.quantify(:one_or_more, token.text, 1, -1, :greedy) when :one_or_more_reluctant target_node.quantify(:one_or_more, token.text, 1, -1, :reluctant) when :one_or_more_possessive target_node.quantify(:one_or_more, token.text, 1, -1, :possessive) when :interval interval(target_node, token) else raise UnknownTokenError.new('Quantifier', token) end end def interval(target_node, token) text = token.text mchr = text[text.length-1].chr =~ /[?+]/ ? text[text.length-1].chr : nil case mchr when '?' range_text = text[0...-1] mode = :reluctant when '+' range_text = text[0...-1] mode = :possessive else range_text = text mode = :greedy end range = range_text.gsub(/\{|\}/, '').split(',', 2) min = range[0].empty? ? 0 : range[0] max = range[1] ? (range[1].empty? ? -1 : range[1]) : min target_node.quantify(:interval, text, min.to_i, max.to_i, mode) end def group(token) case token.token when :options, :options_switch options_group(token) when :close close_group when :comment node << Group::Comment.new(token, active_opts) else open_group(token) end end MOD_FLAGS = %w[i m x].map(&:to_sym) ENC_FLAGS = %w[a d u].map(&:to_sym) def options_group(token) positive, negative = token.text.split('-', 2) negative ||= '' self.switching_options = token.token.equal?(:options_switch) opt_changes = {} new_active_opts = active_opts.dup MOD_FLAGS.each do |flag| if positive.include?(flag.to_s) opt_changes[flag] = new_active_opts[flag] = true end if negative.include?(flag.to_s) opt_changes[flag] = false new_active_opts.delete(flag) end end if (enc_flag = positive.reverse[/[adu]/]) enc_flag = enc_flag.to_sym (ENC_FLAGS - [enc_flag]).each do |other| opt_changes[other] = false if new_active_opts[other] new_active_opts.delete(other) end opt_changes[enc_flag] = new_active_opts[enc_flag] = true end options_stack << new_active_opts options_group = Group::Options.new(token, active_opts) options_group.option_changes = opt_changes nest(options_group) end def open_group(token) case token.token when :passive exp = Group::Passive.new(token, active_opts) when :atomic exp = Group::Atomic.new(token, active_opts) when :named exp = Group::Named.new(token, active_opts) when :capture exp = Group::Capture.new(token, active_opts) when :absence exp = Group::Absence.new(token, active_opts) when :lookahead exp = Assertion::Lookahead.new(token, active_opts) when :nlookahead exp = Assertion::NegativeLookahead.new(token, active_opts) when :lookbehind exp = Assertion::Lookbehind.new(token, active_opts) when :nlookbehind exp = Assertion::NegativeLookbehind.new(token, active_opts) else raise UnknownTokenError.new('Group type open', token) end if exp.capturing? exp.number = total_captured_group_count + 1 exp.number_at_level = captured_group_count_at_level + 1 count_captured_group end # Push the active options to the stack again. This way we can simply pop the # stack for any group we close, no matter if it had its own options or not. options_stack << active_opts nest(exp) end def close_group options_stack.pop unless switching_options self.switching_options = false decrease_nesting end def open_set(token) token.token = :character nest(CharacterSet.new(token, active_opts)) end def negate_set node.negate end def close_set decrease_nesting(&:close) end def range(token) exp = CharacterSet::Range.new(token, active_opts) scope = node.last.is_a?(CharacterSet::IntersectedSequence) ? node.last : node exp << scope.expressions.pop nest(exp) end def close_completed_character_set_range decrease_nesting if node.is_a?(CharacterSet::Range) && node.complete? end def intersection(token) sequence_operation(CharacterSet::Intersection, token) end def sequence_operation(klass, token) unless node.is_a?(klass) operator = klass.new(token, active_opts) sequence = operator.add_sequence(active_opts) sequence.expressions = node.expressions node.expressions = [] nest(operator) end node.add_sequence(active_opts) end def active_opts options_stack.last end def total_captured_group_count captured_group_counts.values.reduce(0, :+) end def captured_group_count_at_level captured_group_counts[node.level] end def count_captured_group captured_group_counts[node.level] += 1 end def assign_effective_number(exp) exp.effective_number = exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0) end def assign_referenced_expressions targets = {} root.each_expression do |exp| exp.is_a?(Group::Capture) && targets[exp.identifier] = exp end root.each_expression do |exp| exp.respond_to?(:reference) && exp.referenced_expression = targets[exp.reference] end end end # module Regexp::Parser regexp_parser-1.6.0/lib/regexp_parser/expression.rb0000644000004100000410000001152713541126475022572 0ustar www-datawww-datamodule Regexp::Expression class Base attr_accessor :type, :token attr_accessor :text, :ts attr_accessor :level, :set_level, :conditional_level, :nesting_level attr_accessor :quantifier attr_accessor :options def initialize(token, options = {}) self.type = token.type self.token = token.token self.text = token.text self.ts = token.ts self.level = token.level self.set_level = token.set_level self.conditional_level = token.conditional_level self.nesting_level = 0 self.quantifier = nil self.options = options end def initialize_clone(orig) self.text = (orig.text ? orig.text.dup : nil) self.options = (orig.options ? orig.options.dup : nil) self.quantifier = (orig.quantifier ? orig.quantifier.clone : nil) super end def to_re(format = :full) ::Regexp.new(to_s(format)) end alias :starts_at :ts def full_length to_s.length end def offset [starts_at, full_length] end def coded_offset '@%d+%d' % offset end def to_s(format = :full) "#{text}#{quantifier_affix(format)}" end def quantifier_affix(expression_format) quantifier.to_s if quantified? && expression_format != :base end def terminal? !respond_to?(:expressions) end def quantify(token, text, min = nil, max = nil, mode = :greedy) self.quantifier = Quantifier.new(token, text, min, max, mode) end def unquantified_clone clone.tap { |exp| exp.quantifier = nil } end def quantified? !quantifier.nil? end # Deprecated. Prefer `#repetitions` which has a more uniform interface. def quantity return [nil,nil] unless quantified? [quantifier.min, quantifier.max] end def repetitions return 1..1 unless quantified? min = quantifier.min max = quantifier.max < 0 ? Float::INFINITY : quantifier.max # fix Range#minmax - https://bugs.ruby-lang.org/issues/15807 (min..max).tap { |r| r.define_singleton_method(:minmax) { [min, max] } } end def greedy? quantified? and quantifier.greedy? end def reluctant? quantified? and quantifier.reluctant? end alias :lazy? :reluctant? def possessive? quantified? and quantifier.possessive? end def attributes { type: type, token: token, text: to_s(:base), starts_at: ts, length: full_length, level: level, set_level: set_level, conditional_level: conditional_level, options: options, quantifier: quantified? ? quantifier.to_h : nil, } end alias :to_h :attributes end def self.parsed(exp) warn('WARNING: Regexp::Expression::Base.parsed is buggy and '\ 'will be removed in 2.0.0. Use Regexp::Parser.parse instead.') case exp when String Regexp::Parser.parse(exp) when Regexp Regexp::Parser.parse(exp.source) # <- causes loss of root options when Regexp::Expression # <- never triggers exp else raise ArgumentError, 'Expression.parsed accepts a String, Regexp, or '\ 'a Regexp::Expression as a value for exp, but it '\ "was given #{exp.class.name}." end end end # module Regexp::Expression require 'regexp_parser/expression/quantifier' require 'regexp_parser/expression/subexpression' require 'regexp_parser/expression/sequence' require 'regexp_parser/expression/sequence_operation' require 'regexp_parser/expression/classes/alternation' require 'regexp_parser/expression/classes/anchor' require 'regexp_parser/expression/classes/backref' require 'regexp_parser/expression/classes/conditional' require 'regexp_parser/expression/classes/escape' require 'regexp_parser/expression/classes/free_space' require 'regexp_parser/expression/classes/group' require 'regexp_parser/expression/classes/keep' require 'regexp_parser/expression/classes/literal' require 'regexp_parser/expression/classes/posix_class' require 'regexp_parser/expression/classes/property' require 'regexp_parser/expression/classes/root' require 'regexp_parser/expression/classes/set' require 'regexp_parser/expression/classes/set/intersection' require 'regexp_parser/expression/classes/set/range' require 'regexp_parser/expression/classes/type' require 'regexp_parser/expression/methods/match' require 'regexp_parser/expression/methods/match_length' require 'regexp_parser/expression/methods/options' require 'regexp_parser/expression/methods/strfregexp' require 'regexp_parser/expression/methods/tests' require 'regexp_parser/expression/methods/traverse' regexp_parser-1.6.0/lib/regexp_parser.rb0000644000004100000410000000031513541126475020364 0ustar www-datawww-data# encoding: utf-8 require 'regexp_parser/version' require 'regexp_parser/token' require 'regexp_parser/scanner' require 'regexp_parser/syntax' require 'regexp_parser/lexer' require 'regexp_parser/parser' regexp_parser-1.6.0/Gemfile0000644000004100000410000000024213541126475015715 0ustar www-datawww-datasource 'https://rubygems.org' gemspec group :development, :test do gem 'rake', '~> 12.2' gem 'regexp_property_values', '~> 1.0' gem 'rspec', '~> 3.8' end regexp_parser-1.6.0/regexp_parser.gemspec0000644000004100000410000000221713541126476020642 0ustar www-datawww-data$:.unshift File.join(File.dirname(__FILE__), 'lib') require 'regexp_parser/version' Gem::Specification.new do |gem| gem.name = 'regexp_parser' gem.version = ::Regexp::Parser::VERSION gem.summary = "Scanner, lexer, parser for ruby's regular expressions" gem.description = 'A library for tokenizing, lexing, and parsing Ruby regular expressions.' gem.homepage = 'https://github.com/ammar/regexp_parser' if gem.respond_to?(:metadata) gem.metadata = { 'issue_tracker' => 'https://github.com/ammar/regexp_parser/issues' } end gem.authors = ['Ammar Ali'] gem.email = ['ammarabuali@gmail.com'] gem.license = 'MIT' gem.require_paths = ['lib'] gem.files = Dir.glob('{lib,spec}/**/*.rb') + Dir.glob('lib/**/*.rl') + Dir.glob('lib/**/*.yml') + %w(Gemfile Rakefile LICENSE README.md CHANGELOG.md regexp_parser.gemspec) gem.test_files = Dir.glob('spec/**/*.rb') gem.rdoc_options = ["--inline-source", "--charset=UTF-8"] gem.platform = Gem::Platform::RUBY gem.required_ruby_version = '>= 1.9.1' end