citrus-3.0.2/0000755000175000017500000000000013124403235012046 5ustar pravipravicitrus-3.0.2/README.md0000644000175000017500000005710013124403235013330 0ustar pravipraviCitrus :: Parsing Expressions for Ruby Citrus is a compact and powerful parsing library for [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of the language with the simplicity and power of [parsing expressions](http://en.wikipedia.org/wiki/Parsing_expression_grammar). # Installation Via [RubyGems](http://rubygems.org/): $ gem install citrus From a local copy: $ git clone git://github.com/mjackson/citrus.git $ cd citrus $ rake package install # Background In order to be able to use Citrus effectively, you must first understand the difference between syntax and semantics. Syntax is a set of rules that govern the way letters and punctuation may be used in a language. For example, English syntax dictates that proper nouns should start with a capital letter and that sentences should end with a period. Semantics are the rules by which meaning may be derived in a language. For example, as you read a book you are able to make some sense of the particular way in which words on a page are combined to form thoughts and express ideas because you understand what the words themselves mean and you understand what they mean collectively. Computers use a similar process when interpreting code. First, the code must be parsed into recognizable symbols or tokens. These tokens may then be passed to an interpreter which is responsible for forming actual instructions from them. Citrus is a pure Ruby library that allows you to perform both lexical analysis and semantic interpretation quickly and easily. Using Citrus you can write powerful parsers that are simple to understand and easy to create and maintain. In Citrus, there are three main types of objects: rules, grammars, and matches. ## Rules A [Rule](http://mjackson.github.io/citrus/api/classes/Citrus/Rule.html) is an object that specifies some matching behavior on a string. There are two types of rules: terminals and non-terminals. Terminals can be either Ruby strings or regular expressions that specify some input to match. For example, a terminal created from the string "end" would match any sequence of the characters "e", "n", and "d", in that order. Terminals created from regular expressions may match any sequence of characters that can be generated from that expression. Non-terminals are rules that may contain other rules but do not themselves match directly on the input. For example, a Repeat is a non-terminal that may contain one other rule that will try and match a certain number of times. Several other types of non-terminals are available that will be discussed later. Rule objects may also have semantic information associated with them in the form of Ruby modules. Rules use these modules to extend the matches they create. ## Grammars A [Grammar](http://mjackson.github.io/citrus/api/classes/Citrus/Grammar.html) is a container for rules. Usually the rules in a grammar collectively form a complete specification for some language, or a well-defined subset thereof. A Citrus grammar is really just a souped-up Ruby [module](http://ruby-doc.org/core/classes/Module.html). These modules may be included in other grammar modules in the same way that Ruby modules are normally used. This property allows you to divide a complex grammar into more manageable, reusable pieces that may be combined at runtime. Any rule with the same name as a rule in an included grammar may access that rule with a mechanism similar to Ruby's `super` keyword. ## Matches A [Match](http://mjackson.github.io/citrus/api/classes/Citrus/Match.html) object represents a successful recognition of some piece of the input. Matches are created by rule objects during a parse. Matches are arranged in a tree structure where any match may contain any number of other matches. Each match contains information about its own subtree. The structure of the tree is determined by the way in which the rule that generated each match is used in the grammar. For example, a match that is created from a nonterminal rule that contains several other terminals will likewise contain several matches, one for each terminal. However, this is an implementation detail and should be relatively transparent to the user. Match objects may be extended with semantic information in the form of methods. These methods should provide various interpretations for the semantic value of a match. # Syntax The most straightforward way to compose a Citrus grammar is to use Citrus' own custom grammar syntax. This syntax borrows heavily from Ruby, so it should already be familiar to Ruby programmers. ## Terminals Terminals may be represented by a string or a regular expression. Both follow the same rules as Ruby string and regular expression literals. 'abc' # match "abc" "abc\n" # match "abc\n" /abc/i # match "abc" in any case /\xFF/ # match "\xFF" Character classes and the dot (match anything) symbol are supported as well for compatibility with other parsing expression implementations. [a-z0-9] # match any lowercase letter or digit [\x00-\xFF] # match any octet . # match any single character, including new lines Also, strings may use backticks instead of quotes to indicate that they should match in a case-insensitive manner. `abc` # match "abc" in any case Besides case sensitivity, case-insensitive strings have the same behavior as double quoted strings. See [Terminal](http://mjackson.github.io/citrus/api/classes/Citrus/Terminal.html) and [StringTerminal](http://mjackson.github.io/citrus/api/classes/Citrus/StringTerminal.html) for more information. ## Repetition Quantifiers may be used after any expression to specify a number of times it must match. The universal form of a quantifier is `N*M` where `N` is the minimum and `M` is the maximum number of times the expression may match. 'abc'1*2 # match "abc" a minimum of one, maximum of two times 'abc'1* # match "abc" at least once 'abc'*2 # match "abc" a maximum of twice Additionally, the minimum and maximum may be omitted entirely to specify that an expression may match zero or more times. 'abc'* # match "abc" zero or more times The `+` and `?` operators are supported as well for the common cases of `1*` and `*1` respectively. 'abc'+ # match "abc" one or more times 'abc'? # match "abc" zero or one time See [Repeat](http://mjackson.github.io/citrus/api/classes/Citrus/Repeat.html) for more information. ## Lookahead Both positive and negative lookahead are supported in Citrus. Use the `&` and `!` operators to indicate that an expression either should or should not match. In neither case is any input consumed. 'a' &'b' # match an "a" that is followed by a "b" 'a' !'b' # match an "a" that is not followed by a "b" !'a' . # match any character except for "a" A special form of lookahead is also supported which will match any character that does not match a given expression. ~'a' # match all characters until an "a" ~/xyz/ # match all characters until /xyz/ matches When using this operator (the tilde), at least one character must be consumed for the rule to succeed. See [AndPredicate](http://mjackson.github.io/citrus/api/classes/Citrus/AndPredicate.html), [NotPredicate](http://mjackson.github.io/citrus/api/classes/Citrus/NotPredicate.html), and [ButPredicate](http://mjackson.github.io/citrus/api/classes/Citrus/ButPredicate.html) for more information. ## Sequences Sequences of expressions may be separated by a space to indicate that the rules should match in that order. 'a' 'b' 'c' # match "a", then "b", then "c" 'a' [0-9] # match "a", then a numeric digit See [Sequence](http://mjackson.github.io/citrus/api/classes/Citrus/Sequence.html) for more information. ## Choices Ordered choice is indicated by a vertical bar that separates two expressions. When using choice, each expression is tried in order. When one matches, the rule returns the match immediately without trying the remaining rules. 'a' | 'b' # match "a" or "b" 'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c" It is important to note when using ordered choice that any operator binds more tightly than the vertical bar. A full chart of operators and their respective levels of precedence is below. See [Choice](http://mjackson.github.io/citrus/api/classes/Citrus/Choice.html) for more information. ## Labels Match objects may be referred to by a different name than the rule that originally generated them. Labels are added by placing the label and a colon immediately preceding any expression. chars:/[a-z]+/ # the characters matched by the regular expression # may be referred to as "chars" in an extension # method ## Extensions Extensions may be specified using either "module" or "block" syntax. When using module syntax, specify the name of a module that is used to extend match objects in between less than and greater than symbols. [a-z0-9]5*9 # match a string that consists of any lower # cased letter or digit between 5 and 9 # times and extend the match with the # CouponCode module Additionally, extensions may be specified inline using curly braces. When using this method, the code inside the curly braces may be invoked by calling the `value` method on the match object. [0-9] { to_str.to_i } # match any digit and return its integer value when # calling the #value method on the match object Note that when using the inline block method you may also specify arguments in between vertical bars immediately following the opening curly brace, just like in Ruby blocks. ## Super When including a grammar inside another, all rules in the child that have the same name as a rule in the parent also have access to the `super` keyword to invoke the parent rule. grammar Number rule number [0-9]+ end end grammar FloatingPoint include Number rule number super ('.' super)? end end In the example above, the `FloatingPoint` grammar includes `Number`. Both have a rule named `number`, so `FloatingPoint#number` has access to `Number#number` by means of using `super`. See [Super](http://mjackson.github.io/citrus/api/classes/Citrus/Super.html) for more information. ## Precedence The following table contains a list of all Citrus symbols and operators and their precedence. A higher precedence indicates tighter binding. Operator | Name | Precedence ------------------------- | ------------------------- | ---------- `''` | String (single quoted) | 7 `""` | String (double quoted) | 7 `` | String (case insensitive) | 7 `[]` | Character class | 7 `.` | Dot (any character) | 7 `//` | Regular expression | 7 `()` | Grouping | 7 `*` | Repetition (arbitrary) | 6 `+` | Repetition (one or more) | 6 `?` | Repetition (zero or one) | 6 `&` | And predicate | 5 `!` | Not predicate | 5 `~` | But predicate | 5 `<>` | Extension (module name) | 4 `{}` | Extension (literal) | 4 `:` | Label | 3 `e1 e2` | Sequence | 2 e1 | e2 | Ordered choice | 1 ## Grouping As is common in many programming languages, parentheses may be used to override the normal binding order of operators. In the following example parentheses are used to make the vertical bar between `'b'` and `'c'` bind tighter than the space between `'a'` and `'b'`. 'a' ('b' | 'c') # match "a", then "b" or "c" # Example Below is an example of a simple grammar that is able to parse strings of integers separated by any amount of white space and a `+` symbol. grammar Addition rule additive number plus (additive | number) end rule number [0-9]+ space end rule plus '+' space end rule space [ \t]* end end Several things to note about the above example: * Grammar and rule declarations end with the `end` keyword * A sequence of rules is created by separating expressions with a space * Likewise, ordered choice is represented with a vertical bar * Parentheses may be used to override the natural binding order * Rules may refer to other rules in their own definitions simply by using the other rule's name * Any expression may be followed by a quantifier ## Interpretation The grammar above is able to parse simple mathematical expressions such as "1+2" and "1 + 2+3", but it does not have enough semantic information to be able to actually interpret these expressions. At this point, when the grammar parses a string it generates a tree of [Match](http://mjackson.github.io/citrus/api/classes/Citrus/Match.html) objects. Each match is created by a rule and may itself be comprised of any number of submatches. Submatches are created whenever a rule contains another rule. For example, in the grammar above `number` matches a string of digits followed by white space. Thus, a match generated by this rule will contain two submatches. We can define a method inside a set of curly braces that will be used to extend a particular rule's matches. This works in similar fashion to using Ruby's blocks. Let's extend the `Addition` grammar using this technique. grammar Addition rule additive (number plus term:(additive | number)) { capture(:number).value + capture(:term).value } end rule number ([0-9]+ space) { to_str.to_i } end rule plus '+' space end rule space [ \t]* end end In this version of the grammar we have added two semantic blocks, one each for the `additive` and `number` rules. These blocks contain code that we can execute by calling `value` on match objects that result from those rules. It's easiest to explain what is going on here by starting with the lowest level block, which is defined within `number`. Inside this block we see a call to another method, namely `to_str`. When called in the context of a match object, this method returns the match's internal string object. Thus, the call to `to_str.to_i` should return the integer value of the match. Similarly, matches created by `additive` will also have a `value` method. Notice the use of the `term` label within the rule definition. This label allows the match that is created by the choice between `additive` and `number` to be retrieved using `capture(:term)`. The value of an additive match is determined to be the values of its `number` and `term` matches added together using Ruby's addition operator. Note that the plural form `captures(:term)` can be used to get an array of matches for a given label (e.g. when the label belongs to a repetition). Since `additive` is the first rule defined in the grammar, any match that results from parsing a string with this grammar will have a `value` method that can be used to recursively calculate the collective value of the entire match tree. To give it a try, save the code for the `Addition` grammar in a file called addition.citrus. Next, assuming you have the Citrus [gem](https://rubygems.org/gems/citrus) installed, try the following sequence of commands in a terminal. $ irb > require 'citrus' => true > Citrus.load 'addition' => [Addition] > m = Addition.parse '1 + 2 + 3' => # m.value => 6 Congratulations! You just ran your first piece of Citrus code. One interesting thing to notice about the above sequence of commands is the return value of [Citrus#load](http://mjackson.github.io/citrus/api/classes/Citrus.html#M000003). When you use `Citrus.load` to load a grammar file (and likewise [Citrus#eval](http://mjackson.github.io/citrus/api/classes/Citrus.html#M000004) to evaluate a raw string of grammar code), the return value is an array of all the grammars present in that file. Take a look at [calc.citrus](http://github.com/mjackson/citrus/blob/master/lib/citrus/grammars/calc.citrus) for an example of a calculator that is able to parse and evaluate more complex mathematical expressions. ## Additional Methods If you need more than just a `value` method on your match object, you can attach additional methods as well. There are two ways to do this. The first lets you define additional methods inline in your semantic block. This block will be used to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682). Using the `Addition` example above, we might refactor the `additive` rule to look like this: rule additive (number plus term:(additive | number)) { def lhs capture(:number).value end def rhs capture(:term).value end def value lhs + rhs end } end Now, in addition to having a `value` method, matches that result from the `additive` rule will have a `lhs` and a `rhs` method as well. Although not particularly useful in this example, this technique can be useful when unit testing more complex rules. For example, using this method you might make the following assertions in a unit test: match = Addition.parse('1 + 4') assert_equal(1, match.lhs) assert_equal(4, match.rhs) assert_equal(5, match.value) If you would like to abstract away the code in a semantic block, simply create a separate Ruby module (in another file) that contains the extension methods you want and use the angle bracket notation to indicate that a rule should use that module when extending matches. To demonstrate this method with the above example, in a Ruby file you would define the following module. module Additive def lhs capture(:number).value end def rhs capture(:term).value end def value lhs + rhs end end Then, in your Citrus grammar file the rule definition would look like this: rule additive (number plus term:(additive | number)) end This method of defining extensions can help keep your grammar files cleaner. However, you do need to make sure that your extension modules are already loaded before using `Citrus.load` to load your grammar file. # Testing Citrus was designed to facilitate simple and powerful testing of grammars. To demonstrate how this is to be done, we'll use the `Addition` grammar from our previous [example](example.html). The following code demonstrates a simple test case that could be used to test that our grammar works properly. class AdditionTest < Test::Unit::TestCase def test_additive match = Addition.parse('23 + 12', :root => :additive) assert(match) assert_equal('23 + 12', match) assert_equal(35, match.value) end def test_number match = Addition.parse('23', :root => :number) assert(match) assert_equal('23', match) assert_equal(23, match.value) end end The key here is using the `:root` option when performing the parse to specify the name of the rule at which the parse should start. In `test_number`, since `:number` was given the parse will start at that rule as if it were the root rule of the entire grammar. The ability to change the root rule on the fly like this enables easy unit testing of the entire grammar. Also note that because match objects are themselves strings, assertions may be made to test equality of match objects with string values. ## Debugging When a parse fails, a [ParseError](http://mjackson.github.io/citrus/api/classes/Citrus/ParseError.html) object is generated which provides a wealth of information about exactly where the parse failed including the offset, line number, line text, and line offset. Using this object, you could possibly provide some useful feedback to the user about why the input was bad. The following code demonstrates one way to do this. def parse_some_stuff(stuff) match = StuffGrammar.parse(stuff) rescue Citrus::ParseError => e raise ArgumentError, "Invalid stuff on line %d, offset %d!" % [e.line_number, e.line_offset] end In addition to useful error objects, Citrus also includes a means of visualizing match trees in the console via `Match#dump`. This can help when determining which rules are generating which matches and how they are organized in the match tree. # Extras Several files are included in the Citrus repository that make it easier to work with grammar files in various editors. ## TextMate To install the Citrus [TextMate](http://macromates.com/) bundle, simply double-click on the `Citrus.tmbundle` file in the `extras` directory. ## Vim To install the [Vim](http://www.vim.org/) scripts, copy the files in `extras/vim` to a directory in Vim's [runtimepath](http://vimdoc.sourceforge.net/htmldoc/options.html#\'runtimepath\'). # Examples The project source directory contains several example scripts that demonstrate how grammars are to be constructed and used. Each Citrus file in the examples directory has an accompanying Ruby file that contains a suite of tests for that particular file. The best way to run any of these examples is to pass the name of the Ruby file directly to the Ruby interpreter on the command line, e.g.: $ ruby -Ilib examples/calc_test.rb This particular invocation uses the `-I` flag to ensure that you are using the version of Citrus that was bundled with that particular example file (i.e. the version that is contained in the `lib` directory). # Links Discussion around Citrus happens on the [citrus-users Google group](http://groups.google.com/group/citrus-users). The primary resource for all things to do with parsing expressions can be found on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat) at MIT. Also, a useful summary of parsing expression grammars can be found on [Wikipedia](http://en.wikipedia.org/wiki/Parsing_expression_grammar). Citrus draws inspiration from another Ruby library for writing parsing expression grammars, Treetop. While Citrus' syntax is similar to that of [Treetop](http://treetop.rubyforge.org), it's not identical. The link is included here for those who may wish to explore an alternative implementation. # License Copyright 2010-2011 Michael Jackson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software. citrus-3.0.2/lib/0000755000175000017500000000000013124403235012614 5ustar pravipravicitrus-3.0.2/lib/citrus.rb0000644000175000017500000012215713124403235014462 0ustar pravipravi# encoding: UTF-8 require 'strscan' require 'pathname' require 'citrus/version' # Citrus is a compact and powerful parsing library for Ruby that combines the # elegance and expressiveness of the language with the simplicity and power of # parsing expressions. # # http://mjackson.github.io/citrus module Citrus autoload :File, 'citrus/file' # A pattern to match any character, including newline. DOT = /./mu Infinity = 1.0 / 0 CLOSE = -1 # Returns a map of paths of files that have been loaded via #load to the # result of #eval on the code in that file. # # Note: These paths are not absolute unless you pass an absolute path to # #load. That means that if you change the working directory and try to # #require the same file with a different relative path, it will be loaded # twice. def self.cache @cache ||= {} end # Evaluates the given Citrus parsing expression grammar +code+ and returns an # array of any grammar modules that are created. Accepts the same +options+ as # GrammarMethods#parse. # # Citrus.eval(< [MyGrammar] # def self.eval(code, options={}) File.parse(code, options).value end # Evaluates the given expression and creates a new Rule object from it. # Accepts the same +options+ as #eval. # # Citrus.rule('"a" | "b"') # # => # # def self.rule(expr, options={}) eval(expr, options.merge(:root => :expression)) end # Loads the grammar(s) from the given +file+. Accepts the same +options+ as # #eval, plus the following: # # force:: Normally this method will not reload a file that is already in # the #cache. However, if this option is +true+ the file will be # loaded, regardless of whether or not it is in the cache. Defaults # to +false+. # # Citrus.load('mygrammar') # # => [MyGrammar] # def self.load(file, options={}) file += '.citrus' unless /\.citrus$/ === file force = options.delete(:force) if force || !cache[file] begin cache[file] = eval(::File.read(file), options) rescue SyntaxError => e e.message.replace("#{::File.expand_path(file)}: #{e.message}") raise e end end cache[file] end # Searches the $LOAD_PATH for a +file+ with the .citrus suffix and # attempts to load it via #load. Returns the path to the file that was loaded # on success, +nil+ on failure. Accepts the same +options+ as #load. # # path = Citrus.require('mygrammar') # # => "/path/to/mygrammar.citrus" # Citrus.cache[path] # # => [MyGrammar] # def self.require(file, options={}) file += '.citrus' unless /\.citrus$/ === file found = nil paths = [''] paths += $LOAD_PATH unless Pathname.new(file).absolute? paths.each do |path| found = Dir[::File.join(path, file)].first break if found end if found Citrus.load(found, options) else raise LoadError, "Cannot find file #{file}" end found end # A base class for all Citrus errors. class Error < StandardError; end # Raised when Citrus.require can't find the file to load. class LoadError < Error; end # Raised when a parse fails. class ParseError < Error # The +input+ given here is an instance of Citrus::Input. def initialize(input) @offset = input.max_offset @line_offset = input.line_offset(offset) @line_number = input.line_number(offset) @line = input.line(offset) message = "Failed to parse input on line #{line_number}" message << " at offset #{line_offset}\n#{detail}" super(message) end # The 0-based offset at which the error occurred in the input, i.e. the # maximum offset in the input that was successfully parsed before the error # occurred. attr_reader :offset # The 0-based offset at which the error occurred on the line on which it # occurred in the input. attr_reader :line_offset # The 1-based number of the line in the input where the error occurred. attr_reader :line_number # The text of the line in the input where the error occurred. attr_reader :line # Returns a string that, when printed, gives a visual representation of # exactly where the error occurred on its line in the input. def detail "#{line}\n#{' ' * line_offset}^" end end # Raised when Citrus::File.parse fails. class SyntaxError < Error # The +error+ given here is an instance of Citrus::ParseError. def initialize(error) message = "Malformed Citrus syntax on line #{error.line_number}" message << " at offset #{error.line_offset}\n#{error.detail}" super(message) end end # An Input is a scanner that is responsible for executing rules at different # positions in the input string and persisting event streams. class Input < StringScanner def initialize(source) super(source_text(source)) @source = source @max_offset = 0 end # The maximum offset in the input that was successfully parsed. attr_reader :max_offset # The initial source passed at construction. Typically a String # or a Pathname. attr_reader :source def reset # :nodoc: @max_offset = 0 super end # Returns an array containing the lines of text in the input. def lines if string.respond_to?(:lines) string.lines.to_a else string.to_a end end # Returns the 0-based offset of the given +pos+ in the input on the line # on which it is found. +pos+ defaults to the current pointer position. def line_offset(pos=pos()) p = 0 string.each_line do |line| len = line.length return (pos - p) if p + len >= pos p += len end 0 end # Returns the 0-based number of the line that contains the character at the # given +pos+. +pos+ defaults to the current pointer position. def line_index(pos=pos()) p = n = 0 string.each_line do |line| p += line.length return n if p >= pos n += 1 end 0 end # Returns the 1-based number of the line that contains the character at the # given +pos+. +pos+ defaults to the current pointer position. def line_number(pos=pos()) line_index(pos) + 1 end alias_method :lineno, :line_number # Returns the text of the line that contains the character at the given # +pos+. +pos+ defaults to the current pointer position. def line(pos=pos()) lines[line_index(pos)] end # Returns +true+ when using memoization to cache match results. def memoized? false end # Returns an array of events for the given +rule+ at the current pointer # position. Objects in this array may be one of three types: a Rule, # Citrus::CLOSE, or a length (integer). def exec(rule, events=[]) position = pos index = events.size if apply_rule(rule, position, events).size > index @max_offset = pos if pos > @max_offset else self.pos = position end events end # Returns the length of a match for the given +rule+ at the current pointer # position, +nil+ if none can be made. def test(rule) position = pos events = apply_rule(rule, position, []) self.pos = position events[-1] end # Returns the scanned string. alias_method :to_str, :string private # Returns the text to parse from +source+. def source_text(source) if source.respond_to?(:to_path) ::File.read(source.to_path) elsif source.respond_to?(:read) source.read elsif source.respond_to?(:to_str) source.to_str else raise ArgumentError, "Unable to parse from #{source}", caller end end # Appends all events for +rule+ at the given +position+ to +events+. def apply_rule(rule, position, events) rule.exec(self, events) end end # A MemoizedInput is an Input that caches segments of the event stream for # particular rules in a parse. This technique (also known as "Packrat" # parsing) guarantees parsers will operate in linear time but costs # significantly more in terms of time and memory required to perform a parse. # For more information, please read the paper on Packrat parsing at # http://pdos.csail.mit.edu/~baford/packrat/icfp02/. class MemoizedInput < Input def initialize(string) super(string) @cache = {} @cache_hits = 0 end # A nested hash of rules to offsets and their respective matches. attr_reader :cache # The number of times the cache was hit. attr_reader :cache_hits def reset # :nodoc: @cache.clear @cache_hits = 0 super end # Returns +true+ when using memoization to cache match results. def memoized? true end private def apply_rule(rule, position, events) # :nodoc: memo = @cache[rule] ||= {} if memo[position] @cache_hits += 1 c = memo[position] unless c.empty? events.concat(c) self.pos += events[-1] end else index = events.size rule.exec(self, events) # Memoize the result so we can use it next time this same rule is # executed at this position. memo[position] = events.slice(index, events.size) end events end end # Inclusion of this module into another extends the receiver with the grammar # helper methods in GrammarMethods. Although this module does not actually # provide any methods, constants, or variables to modules that include it, the # mere act of inclusion provides a useful lookup mechanism to determine if a # module is in fact a grammar. module Grammar # Creates a new anonymous module that includes Grammar. If a +block+ is # provided, it is +module_eval+'d in the context of the new module. Grammars # created with this method may be assigned a name by being assigned to some # constant, e.g.: # # MyGrammar = Citrus::Grammar.new {} # def self.new(&block) mod = Module.new { include Grammar } mod.module_eval(&block) if block mod end # Extends all modules that +include Grammar+ with GrammarMethods and # exposes Module#include. def self.included(mod) mod.extend(GrammarMethods) # Expose #include so it can be called publicly. class << mod; public :include end end end # Contains methods that are available to Grammar modules at the class level. module GrammarMethods def self.extend_object(obj) raise ArgumentError, "Grammars must be Modules" unless Module === obj super end # Parses the given +source+ using this grammar's root rule. Accepts the same # +options+ as Rule#parse, plus the following: # # root:: The name of the root rule to start parsing at. Defaults to this # grammar's #root. def parse(source, options={}) rule_name = options.delete(:root) || root raise Error, "No root rule specified" unless rule_name rule = rule(rule_name) raise Error, "No rule named \"#{rule_name}\"" unless rule rule.parse(source, options) end # Parses the contents of the file at the given +path+ using this grammar's # #root rule. Accepts the same +options+ as #parse. def parse_file(path, options={}) path = Pathname.new(path.to_str) unless Pathname === path parse(path, options) end # Returns the name of this grammar as a string. def name super.to_s end # Returns an array of all grammars that have been included in this grammar # in the reverse order they were included. def included_grammars included_modules.select {|mod| mod.include?(Grammar) } end # Returns an array of all names of rules in this grammar as symbols ordered # in the same way they were declared. def rule_names @rule_names ||= [] end # Returns a hash of all Rule objects in this grammar, keyed by rule name. def rules @rules ||= {} end # Returns +true+ if this grammar has a rule with the given +name+. def has_rule?(name) rules.key?(name.to_sym) end # Loops through the rule tree for the given +rule+ looking for any Super # rules. When it finds one, it sets that rule's rule name to the given # +name+. def setup_super(rule, name) # :nodoc: if Nonterminal === rule rule.rules.each {|r| setup_super(r, name) } elsif Super === rule rule.rule_name = name end end private :setup_super # Searches the inheritance hierarchy of this grammar for a rule named +name+ # and returns it on success. Returns +nil+ on failure. def super_rule(name) sym = name.to_sym included_grammars.each do |grammar| rule = grammar.rule(sym) return rule if rule end nil end # Gets/sets the rule with the given +name+. If +obj+ is given the rule # will be set to the value of +obj+ passed through Rule.for. If a block is # given, its return value will be used for the value of +obj+. # # It is important to note that this method will also check any included # grammars for a rule with the given +name+ if one cannot be found in this # grammar. def rule(name, obj=nil, &block) sym = name.to_sym obj = block.call if block if obj rule_names << sym unless has_rule?(sym) rule = Rule.for(obj) rule.name = name setup_super(rule, name) rule.grammar = self rules[sym] = rule end rules[sym] || super_rule(sym) rescue => e e.message.replace("Cannot create rule \"#{name}\": #{e.message}") raise e end # Gets/sets the +name+ of the root rule of this grammar. If no root rule is # explicitly specified, the name of this grammar's first rule is returned. def root(name=nil) if name @root = name.to_sym else # The first rule in a grammar is the default root. if instance_variable_defined?(:@root) @root else rule_names.first end end end # Creates a new rule that will match any single character. A block may be # provided to specify semantic behavior (via #ext). def dot(&block) ext(Rule.for(DOT), block) end # Creates a new Super for the rule currently being defined in the grammar. A # block may be provided to specify semantic behavior (via #ext). def sup(&block) ext(Super.new, block) end # Creates a new AndPredicate using the given +rule+. A block may be provided # to specify semantic behavior (via #ext). def andp(rule, &block) ext(AndPredicate.new(rule), block) end # Creates a new NotPredicate using the given +rule+. A block may be provided # to specify semantic behavior (via #ext). def notp(rule, &block) ext(NotPredicate.new(rule), block) end # Creates a new ButPredicate using the given +rule+. A block may be provided # to specify semantic behavior (via #ext). def butp(rule, &block) ext(ButPredicate.new(rule), block) end # Creates a new Repeat using the given +rule+. +min+ and +max+ specify the # minimum and maximum number of times the rule must match. A block may be # provided to specify semantic behavior (via #ext). def rep(rule, min=1, max=Infinity, &block) ext(Repeat.new(rule, min, max), block) end # An alias for #rep. def one_or_more(rule, &block) rep(rule, &block) end # An alias for #rep with a minimum of 0. def zero_or_more(rule, &block) rep(rule, 0, &block) end # An alias for #rep with a minimum of 0 and a maximum of 1. def zero_or_one(rule, &block) rep(rule, 0, 1, &block) end # Creates a new Sequence using all arguments. A block may be provided to # specify semantic behavior (via #ext). def all(*args, &block) ext(Sequence.new(args), block) end # Creates a new Choice using all arguments. A block may be provided to # specify semantic behavior (via #ext). def any(*args, &block) ext(Choice.new(args), block) end # Adds +label+ to the given +rule+. A block may be provided to specify # semantic behavior (via #ext). def label(rule, label, &block) rule = ext(rule, block) rule.label = label rule end # Specifies a Module that will be used to extend all matches created with # the given +rule+. A block may also be given that will be used to create # an anonymous module. See Rule#extension=. def ext(rule, mod=nil, &block) rule = Rule.for(rule) mod = block if block rule.extension = mod if mod rule end # Creates a new Module from the given +block+ and sets it to be the # extension of the given +rule+. See Rule#extension=. def mod(rule, &block) rule.extension = Module.new(&block) rule end end # A Rule is an object that is used by a grammar to create matches on an # Input during parsing. module Rule # Returns a new Rule object depending on the type of object given. def self.for(obj) case obj when Rule then obj when Symbol then Alias.new(obj) when String then StringTerminal.new(obj) when Regexp then Terminal.new(obj) when Array then Sequence.new(obj) when Range then Choice.new(obj.to_a) when Numeric then StringTerminal.new(obj.to_s) else raise ArgumentError, "Invalid rule object: #{obj.inspect}" end end # The grammar this rule belongs to, if any. attr_accessor :grammar # Sets the name of this rule. def name=(name) @name = name.to_sym end # The name of this rule. attr_reader :name # Sets the label of this rule. def label=(label) @label = label.to_sym end # A label for this rule. If a rule has a label, all matches that it creates # will be accessible as named captures from the scope of their parent match # using that label. attr_reader :label # Specifies a module that will be used to extend all Match objects that # result from this rule. If +mod+ is a Proc, it is used to create an # anonymous module with a +value+ method. def extension=(mod) if Proc === mod mod = Module.new { define_method(:value, &mod) } end raise ArgumentError, "Extension must be a Module" unless Module === mod @extension = mod end # The module this rule uses to extend new matches. attr_reader :extension # The default set of options to use when calling #parse. def default_options # :nodoc: { :consume => true, :memoize => false, :offset => 0 } end # Attempts to parse the given +string+ and return a Match if any can be # made. +options+ may contain any of the following keys: # # consume:: If this is +true+ a ParseError will be raised unless the # entire input string is consumed. Defaults to +true+. # memoize:: If this is +true+ the matches generated during a parse are # memoized. See MemoizedInput for more information. Defaults to # +false+. # offset:: The offset in +string+ at which to start parsing. Defaults # to 0. def parse(source, options={}) opts = default_options.merge(options) input = (opts[:memoize] ? MemoizedInput : Input).new(source) string = input.string input.pos = opts[:offset] if opts[:offset] > 0 events = input.exec(self) length = events[-1] if !length || (opts[:consume] && length < (string.length - opts[:offset])) raise ParseError, input end Match.new(input, events, opts[:offset]) end # Tests whether or not this rule matches on the given +string+. Returns the # length of the match if any can be made, +nil+ otherwise. Accepts the same # +options+ as #parse. def test(string, options={}) parse(string, options).length rescue ParseError nil end # Tests the given +obj+ for case equality with this rule. def ===(obj) !test(obj).nil? end # Returns +true+ if this rule is a Terminal. def terminal? false end # Returns +true+ if this rule should extend a match but should not appear in # its event stream. def elide? false end # Returns +true+ if this rule needs to be surrounded by parentheses when # using #to_embedded_s. def needs_paren? # :nodoc: is_a?(Nonterminal) && rules.length > 1 end # Returns the Citrus notation of this rule as a string. def to_s if label "#{label}:" + (needs_paren? ? "(#{to_citrus})" : to_citrus) else to_citrus end end # This alias allows strings to be compared to the string representation of # Rule objects. It is most useful in assertions in unit tests, e.g.: # # assert_equal('"a" | "b"', rule) # alias_method :to_str, :to_s # Returns the Citrus notation of this rule as a string that is suitable to # be embedded in the string representation of another rule. def to_embedded_s # :nodoc: if name name.to_s else needs_paren? && label.nil? ? "(#{to_s})" : to_s end end def ==(other) case other when Rule to_s == other.to_s else super end end alias_method :eql?, :== def inspect # :nodoc: to_s end def extend_match(match) # :nodoc: match.extend(extension) if extension end end # A Proxy is a Rule that is a placeholder for another rule. It stores the # name of some other rule in the grammar internally and resolves it to the # actual Rule object at runtime. This lazy evaluation permits creation of # Proxy objects for rules that may not yet be defined. module Proxy include Rule def initialize(rule_name='') self.rule_name = rule_name end # Sets the name of the rule this rule is proxy for. def rule_name=(rule_name) @rule_name = rule_name.to_sym end # The name of this proxy's rule. attr_reader :rule_name # Returns the underlying Rule for this proxy. def rule @rule ||= resolve! end # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) index = events.size if input.exec(rule, events).size > index # Proxy objects insert themselves into the event stream in place of the # rule they are proxy for. events[index] = self end events end # Returns +true+ if this rule should extend a match but should not appear in # its event stream. def elide? # :nodoc: rule.elide? end def extend_match(match) # :nodoc: # Proxy objects preserve the extension of the rule they are proxy for, and # may also use their own extension. rule.extend_match(match) super end end # An Alias is a Proxy for a rule in the same grammar. It is used in rule # definitions when a rule calls some other rule by name. The Citrus notation # is simply the name of another rule without any other punctuation, e.g.: # # name # class Alias include Proxy # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: rule_name.to_s end private # Searches this proxy's grammar and any included grammars for a rule with # this proxy's #rule_name. Raises an error if one cannot be found. def resolve! rule = grammar.rule(rule_name) unless rule raise Error, "No rule named \"#{rule_name}\" in grammar #{grammar}" end rule end end # A Super is a Proxy for a rule of the same name that was defined previously # in the grammar's inheritance chain. Thus, Super's work like Ruby's +super+, # only for rules in a grammar instead of methods in a module. The Citrus # notation is the word +super+ without any other punctuation, e.g.: # # super # class Super include Proxy # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: 'super' end private # Searches this proxy's included grammars for a rule with this proxy's # #rule_name. Raises an error if one cannot be found. def resolve! rule = grammar.super_rule(rule_name) unless rule raise Error, "No rule named \"#{rule_name}\" in hierarchy of grammar #{grammar}" end rule end end # A Terminal is a Rule that matches directly on the input stream and may not # contain any other rule. Terminals are essentially wrappers for regular # expressions. As such, the Citrus notation is identical to Ruby's regular # expression notation, e.g.: # # /expr/ # # Character classes and the dot symbol may also be used in Citrus notation for # compatibility with other parsing expression implementations, e.g.: # # [a-zA-Z] # . # # Character classes have the same semantics as character classes inside Ruby # regular expressions. The dot matches any character, including newlines. class Terminal include Rule def initialize(regexp=/^/) @regexp = regexp end # The actual Regexp object this rule uses to match. attr_reader :regexp # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) match = input.scan(@regexp) if match events << self events << CLOSE events << match.length end events end # Returns +true+ if this rule is case sensitive. def case_sensitive? !@regexp.casefold? end def ==(other) case other when Regexp @regexp == other else super end end alias_method :eql?, :== # Returns +true+ if this rule is a Terminal. def terminal? # :nodoc: true end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: @regexp.inspect end end # A StringTerminal is a Terminal that may be instantiated from a String # object. The Citrus notation is any sequence of characters enclosed in either # single or double quotes, e.g.: # # 'expr' # "expr" # # This notation works the same as it does in Ruby; i.e. strings in double # quotes may contain escape sequences while strings in single quotes may not. # In order to specify that a string should ignore case when matching, enclose # it in backticks instead of single or double quotes, e.g.: # # `expr` # # Besides case sensitivity, case-insensitive strings have the same semantics # as double-quoted strings. class StringTerminal < Terminal # The +flags+ will be passed directly to Regexp#new. def initialize(rule='', flags=0) super(Regexp.new(Regexp.escape(rule), flags)) @string = rule end def ==(other) case other when String @string == other else super end end alias_method :eql?, :== # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: if case_sensitive? @string.inspect else @string.inspect.gsub(/^"|"$/, '`') end end end # A Nonterminal is a Rule that augments the matching behavior of one or more # other rules. Nonterminals may not match directly on the input, but instead # invoke the rule(s) they contain to determine if a match can be made from # the collective result. module Nonterminal include Rule def initialize(rules=[]) @rules = rules.map {|r| Rule.for(r) } end # An array of the actual Rule objects this rule uses to match. attr_reader :rules def grammar=(grammar) # :nodoc: super @rules.each {|r| r.grammar = grammar } end end # An AndPredicate is a Nonterminal that contains a rule that must match. Upon # success an empty match is returned and no input is consumed. The Citrus # notation is any expression preceded by an ampersand, e.g.: # # &expr # class AndPredicate include Nonterminal def initialize(rule='') super([rule]) end # Returns the Rule object this rule uses to match. def rule rules[0] end # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) if input.test(rule) events << self events << CLOSE events << 0 end events end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: '&' + rule.to_embedded_s end end # A NotPredicate is a Nonterminal that contains a rule that must not match. # Upon success an empty match is returned and no input is consumed. The Citrus # notation is any expression preceded by an exclamation mark, e.g.: # # !expr # class NotPredicate include Nonterminal def initialize(rule='') super([rule]) end # Returns the Rule object this rule uses to match. def rule rules[0] end # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) unless input.test(rule) events << self events << CLOSE events << 0 end events end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: '!' + rule.to_embedded_s end end # A ButPredicate is a Nonterminal that consumes all characters until its rule # matches. It must match at least one character in order to succeed. The # Citrus notation is any expression preceded by a tilde, e.g.: # # ~expr # class ButPredicate include Nonterminal DOT_RULE = Rule.for(DOT) def initialize(rule='') super([rule]) end # Returns the Rule object this rule uses to match. def rule rules[0] end # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) length = 0 until input.test(rule) len = input.exec(DOT_RULE)[-1] break unless len length += len end if length > 0 events << self events << CLOSE events << length end events end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: '~' + rule.to_embedded_s end end # A Repeat is a Nonterminal that specifies a minimum and maximum number of # times its rule must match. The Citrus notation is an integer, +N+, followed # by an asterisk, followed by another integer, +M+, all of which follow any # other expression, e.g.: # # expr N*M # # In this notation +N+ specifies the minimum number of times the preceding # expression must match and +M+ specifies the maximum. If +N+ is ommitted, # it is assumed to be 0. Likewise, if +M+ is omitted, it is assumed to be # infinity (no maximum). Thus, an expression followed by only an asterisk may # match any number of times, including zero. # # The shorthand notation + and ? may be used for the common # cases of 1* and *1 respectively, e.g.: # # expr+ # expr? # class Repeat include Nonterminal def initialize(rule='', min=1, max=Infinity) raise ArgumentError, "Min cannot be greater than max" if min > max super([rule]) @min = min @max = max end # Returns the Rule object this rule uses to match. def rule rules[0] end # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) events << self index = events.size start = index - 1 length = n = 0 while n < max && input.exec(rule, events).size > index length += events[-1] index = events.size n += 1 end if n >= min events << CLOSE events << length else events.slice!(start, index) end events end # The minimum number of times this rule must match. attr_reader :min # The maximum number of times this rule may match. attr_reader :max # Returns the operator this rule uses as a string. Will be one of # +, ?, or N*M. def operator @operator ||= case [min, max] when [0, 0] then '' when [0, 1] then '?' when [1, Infinity] then '+' else [min, max].map {|n| n == 0 || n == Infinity ? '' : n.to_s }.join('*') end end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: rule.to_embedded_s + operator end end # A Sequence is a Nonterminal where all rules must match. The Citrus notation # is two or more expressions separated by a space, e.g.: # # expr expr # class Sequence include Nonterminal # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) events << self index = events.size start = index - 1 length = n = 0 m = rules.length while n < m && input.exec(rules[n], events).size > index length += events[-1] index = events.size n += 1 end if n == m events << CLOSE events << length else events.slice!(start, index) end events end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: rules.map {|r| r.to_embedded_s }.join(' ') end end # A Choice is a Nonterminal where only one rule must match. The Citrus # notation is two or more expressions separated by a vertical bar, e.g.: # # expr | expr # class Choice include Nonterminal # Returns an array of events for this rule on the given +input+. def exec(input, events=[]) events << self index = events.size n = 0 m = rules.length while n < m && input.exec(rules[n], events).size == index n += 1 end if index < events.size events << CLOSE events << events[-2] else events.pop end events end # Returns +true+ if this rule should extend a match but should not appear in # its event stream. def elide? # :nodoc: true end # Returns the Citrus notation of this rule as a string. def to_citrus # :nodoc: rules.map {|r| r.to_embedded_s }.join(' | ') end end # The base class for all matches. Matches are organized into a tree where any # match may contain any number of other matches. Nodes of the tree are lazily # instantiated as needed. This class provides several convenient tree # traversal methods that help when examining and interpreting parse results. class Match def initialize(input, events=[], offset=0) @input = input @offset = offset @captures = nil @matches = nil if events.length > 0 elisions = [] while events[0].elide? elisions.unshift(events.shift) events.slice!(-2, events.length) end events[0].extend_match(self) elisions.each do |rule| rule.extend_match(self) end else # Create a default stream of events for the given string. string = input.to_str events = [Rule.for(string), CLOSE, string.length] end @events = events end # The original Input this Match was generated on. attr_reader :input # The index of this match in the #input. attr_reader :offset # The array of events for this match. attr_reader :events # Returns the length of this match. def length events.last end # Convenient shortcut for +input.source+ def source (input.respond_to?(:source) && input.source) || input end # Returns the slice of the source text that this match captures. def string @string ||= input.to_str[offset, length] end # Returns a hash of capture names to arrays of matches with that name, # in the order they appeared in the input. def captures(name = nil) process_events! unless @captures name ? @captures[name] : @captures end # Convenient method for captures[name].first. def capture(name) captures[name].first end # Returns an array of all immediate submatches of this match. def matches process_events! unless @matches @matches end # A shortcut for retrieving the first immediate submatch of this match. def first matches.first end alias_method :to_s, :string # This alias allows strings to be compared to the string value of Match # objects. It is most useful in assertions in unit tests, e.g.: # # assert_equal("a string", match) # alias_method :to_str, :to_s # The default value for a match is its string value. This method is # overridden in most cases to be more meaningful according to the desired # interpretation. alias_method :value, :to_s # Returns this match plus all sub #matches in an array. def to_a [self] + matches end # Returns the capture at the given +key+. If it is an Integer (and an # optional length) or a Range, the result of #to_a with the same arguments # is returned. Otherwise, the value at +key+ in #captures is returned. def [](key, *args) case key when Integer, Range to_a[key, *args] else captures[key] end end def ==(other) case other when String string == other when Match string == other.to_s else super end end alias_method :eql?, :== def inspect string.inspect end # Prints the entire subtree of this match using the given +indent+ to # indicate nested match levels. Useful for debugging. def dump(indent=' ') lines = [] stack = [] offset = 0 close = false index = 0 last_length = nil while index < @events.size event = @events[index] if close os = stack.pop start = stack.pop rule = stack.pop space = indent * (stack.size / 3) string = self.string.slice(os, event) lines[start] = "#{space}#{string.inspect} rule=#{rule}, offset=#{os}, length=#{event}" last_length = event unless last_length close = false elsif event == CLOSE close = true else if last_length offset += last_length last_length = nil end stack << event stack << index stack << offset end index += 1 end puts lines.compact.join("\n") end private # Initializes both the @captures and @matches instance variables. def process_events! @captures = captures_hash @matches = [] capture!(@events[0], self) @captures[0] = self stack = [] offset = 0 close = false index = 0 last_length = nil capture = true while index < @events.size event = @events[index] if close start = stack.pop if Rule === start rule = start os = stack.pop start = stack.pop match = Match.new(input, @events[start..index], @offset + os) capture!(rule, match) if stack.size == 1 @matches << match @captures[@matches.size] = match end capture = true end last_length = event unless last_length close = false elsif event == CLOSE close = true else stack << index # We can calculate the offset of this rule event by adding back the # last match length. if last_length offset += last_length last_length = nil end if capture && stack.size != 1 stack << offset stack << event # We should not create captures when traversing a portion of the # event stream that is masked by a proxy in the original rule # definition. capture = false if Proxy === event end end index += 1 end end def capture!(rule, match) # We can lookup matches that were created by proxy by the name of # the rule they are proxy for. if Proxy === rule if @captures.key?(rule.rule_name) @captures[rule.rule_name] << match else @captures[rule.rule_name] = [match] end end # We can lookup matches that were created by rules with labels by # that label. if rule.label if @captures.key?(rule.label) @captures[rule.label] << match else @captures[rule.label] = [match] end end end # Returns a new Hash that is to be used for @captures. This hash normalizes # String keys to Symbols, returns +nil+ for unknown Numeric keys, and an # empty Array for all other unknown keys. def captures_hash Hash.new do |hash, key| case key when String hash[key.to_sym] when Numeric nil else [] end end end end end citrus-3.0.2/lib/citrus/0000755000175000017500000000000013124403235014125 5ustar pravipravicitrus-3.0.2/lib/citrus/file.rb0000644000175000017500000002025613124403235015376 0ustar pravipravi# encoding: UTF-8 require 'citrus' module Citrus # Some helper methods for rules that alias +module_name+ and don't want to # use +Kernel#eval+ to retrieve Module objects. module ModuleNameHelpers #:nodoc: def module_name capture(:module_name) end def module_segments @module_segments ||= module_name.value.split('::') end def module_namespace module_segments[0...-1].inject(Object) do |namespace, constant| constant.empty? ? namespace : namespace.const_get(constant) end end def module_basename module_segments.last end end # A grammar for Citrus grammar files. This grammar is used in Citrus.eval to # parse and evaluate Citrus grammars and serves as a prime example of how to # create a complex grammar complete with semantic interpretation in pure Ruby. File = Grammar.new do #:nodoc: ## Hierarchical syntax rule :file do all(:space, zero_or_more(any(:require, :grammar))) { captures[:require].each do |req| file = req.value begin require file rescue ::LoadError => e begin Citrus.require(file) rescue LoadError # Re-raise the original LoadError. raise e end end end captures[:grammar].map {|g| g.value } } end rule :grammar do mod all(:grammar_keyword, :module_name, zero_or_more(any(:include, :root, :rule)), :end_keyword) do include ModuleNameHelpers def value grammar = module_namespace.const_set(module_basename, Grammar.new) captures[:include].each {|inc| grammar.include(inc.value) } captures[:rule].each {|r| grammar.rule(r.rule_name.value, r.value) } root = capture(:root) grammar.root(root.value) if root grammar end end end rule :rule do mod all(:rule_keyword, :rule_name, zero_or_one(:expression), :end_keyword) do def rule_name capture(:rule_name) end def value # An empty rule definition matches the empty string. expr = capture(:expression) expr ? expr.value : Rule.for('') end end end rule :expression do all(:sequence, zero_or_more([['|', zero_or_one(:space)], :sequence])) { rules = captures[:sequence].map {|s| s.value } rules.length > 1 ? Choice.new(rules) : rules.first } end rule :sequence do one_or_more(:labelled) { rules = captures[:labelled].map {|l| l.value } rules.length > 1 ? Sequence.new(rules) : rules.first } end rule :labelled do all(zero_or_one(:label), :extended) { label = capture(:label) rule = capture(:extended).value rule.label = label.value if label rule } end rule :extended do all(:prefix, zero_or_one(:extension)) { extension = capture(:extension) rule = capture(:prefix).value rule.extension = extension.value if extension rule } end rule :prefix do all(zero_or_one(:predicate), :suffix) { predicate = capture(:predicate) rule = capture(:suffix).value rule = predicate.value(rule) if predicate rule } end rule :suffix do all(:primary, zero_or_one(:repeat)) { repeat = capture(:repeat) rule = capture(:primary).value rule = repeat.value(rule) if repeat rule } end rule :primary do any(:grouping, :proxy, :terminal) end rule :grouping do all(['(', zero_or_one(:space)], :expression, [')', zero_or_one(:space)]) { capture(:expression).value } end ## Lexical syntax rule :require do all(:require_keyword, :quoted_string) { capture(:quoted_string).value } end rule :include do mod all(:include_keyword, :module_name) do include ModuleNameHelpers def value module_namespace.const_get(module_basename) end end end rule :root do all(:root_keyword, :rule_name) { capture(:rule_name).value } end # Rule names may contain letters, numbers, underscores, and dashes. They # MUST start with a letter. rule :rule_name do all(/[a-zA-Z][a-zA-Z0-9_-]*/, :space) { first.to_s } end rule :proxy do any(:super, :alias) end rule :super do ext(:super_keyword) { Super.new } end rule :alias do all(notp(:end_keyword), :rule_name) { Alias.new(capture(:rule_name).value) } end rule :terminal do any(:quoted_string, :case_insensitive_string, :regular_expression, :character_class, :dot) { primitive = super() if String === primitive StringTerminal.new(primitive, flags) else Terminal.new(primitive) end } end rule :quoted_string do mod all(/(["'])(?:\\?.)*?\1/, :space) do def value eval(first.to_s) end def flags 0 end end end rule :case_insensitive_string do mod all(/`(?:\\?.)*?`/, :space) do def value eval(first.to_s.gsub(/^`|`$/, '"')) end def flags Regexp::IGNORECASE end end end rule :regular_expression do all(/\/(?:\\?.)*?\/[imxouesn]*/, :space) { eval(first.to_s) } end rule :character_class do all(/\[(?:\\?.)*?\]/, :space) { eval("/#{first.to_s.gsub('/', '\\/')}/") } end rule :dot do all('.', :space) { DOT } end rule :label do all(/[a-zA-Z0-9_]+/, :space, ':', :space) { first.to_str.to_sym } end rule :extension do any(:tag, :block) end rule :tag do mod all( ['<', zero_or_one(:space)], :module_name, ['>', zero_or_one(:space)] ) do include ModuleNameHelpers def value module_namespace.const_get(module_basename) end end end rule :block do all( '{', zero_or_more(any(:block, /[^{}]+/)), ['}', zero_or_one(:space)] ) { proc = eval("Proc.new #{to_s}", TOPLEVEL_BINDING) # Attempt to detect if this is a module block using some # extremely simple heuristics. if to_s =~ /\b(def|include) / Module.new(&proc) else proc end } end rule :predicate do any(:and, :not, :but) end rule :and do all('&', :space) { |rule| AndPredicate.new(rule) } end rule :not do all('!', :space) { |rule| NotPredicate.new(rule) } end rule :but do all('~', :space) { |rule| ButPredicate.new(rule) } end rule :repeat do any(:question, :plus, :star) end rule :question do all('?', :space) { |rule| Repeat.new(rule, 0, 1) } end rule :plus do all('+', :space) { |rule| Repeat.new(rule, 1, Infinity) } end rule :star do all(/[0-9]*/, '*', /[0-9]*/, :space) { |rule| min = captures[1] == '' ? 0 : captures[1].to_str.to_i max = captures[3] == '' ? Infinity : captures[3].to_str.to_i Repeat.new(rule, min, max) } end rule :module_name do all(one_or_more([ zero_or_one('::'), :constant ]), :space) { first.to_s } end rule :require_keyword, [ /\brequire\b/, :space ] rule :include_keyword, [ /\binclude\b/, :space ] rule :grammar_keyword, [ /\bgrammar\b/, :space ] rule :root_keyword, [ /\broot\b/, :space ] rule :rule_keyword, [ /\brule\b/, :space ] rule :super_keyword, [ /\bsuper\b/, :space ] rule :end_keyword, [ /\bend\b/, :space ] rule :constant, /[A-Z][a-zA-Z0-9_]*/ rule :white, /[ \t\n\r]/ rule :comment, /#.*/ rule :space, zero_or_more(any(:white, :comment)) end def File.parse(*) # :nodoc: super rescue ParseError => e # Raise SyntaxError when a parse fails. raise SyntaxError, e end end citrus-3.0.2/lib/citrus/grammars.rb0000644000175000017500000000056513124403235016271 0ustar pravipravi# Require this file to use any of the bundled Citrus grammars. # # require 'citrus/grammars' # Citrus.require 'uri' # # match = UniformResourceIdentifier.parse(uri_string) # # => # require 'citrus' grammars = ::File.expand_path(::File.join('..', 'grammars'), __FILE__) $LOAD_PATH.unshift(grammars) unless $LOAD_PATH.include?(grammars) citrus-3.0.2/lib/citrus/core_ext.rb0000644000175000017500000000070513124403235016264 0ustar pravipravirequire 'citrus' class Object # A sugar method for creating Citrus grammars from any namespace. # # grammar :Calc do # end # # module MyModule # grammar :Calc do # end # end # def grammar(name, &block) namespace = respond_to?(:const_set) ? self : Object namespace.const_set(name, Citrus::Grammar.new(&block)) rescue NameError raise ArgumentError, "Invalid grammar name: #{name}" end end citrus-3.0.2/lib/citrus/version.rb0000644000175000017500000000031313124403235016134 0ustar pravipravimodule Citrus # The current version of Citrus as [major, minor, patch]. VERSION = [3, 0, 2] # Returns the current version of Citrus as a string. def self.version VERSION.join('.') end end citrus-3.0.2/doc/0000755000175000017500000000000013124403235012613 5ustar pravipravicitrus-3.0.2/doc/testing.markdown0000644000175000017500000000411513124403235016035 0ustar pravipravi# Testing Citrus was designed to facilitate simple and powerful testing of grammars. To demonstrate how this is to be done, we'll use the `Addition` grammar from our previous [example](example.html). The following code demonstrates a simple test case that could be used to test that our grammar works properly. class AdditionTest < Test::Unit::TestCase def test_additive match = Addition.parse('23 + 12', :root => :additive) assert(match) assert_equal('23 + 12', match) assert_equal(35, match.value) end def test_number match = Addition.parse('23', :root => :number) assert(match) assert_equal('23', match) assert_equal(23, match.value) end end The key here is using the `:root` option when performing the parse to specify the name of the rule at which the parse should start. In `test_number`, since `:number` was given the parse will start at that rule as if it were the root rule of the entire grammar. The ability to change the root rule on the fly like this enables easy unit testing of the entire grammar. Also note that because match objects are themselves strings, assertions may be made to test equality of match objects with string values. ## Debugging When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is generated which provides a wealth of information about exactly where the parse failed including the offset, line number, line text, and line offset. Using this object, you could possibly provide some useful feedback to the user about why the input was bad. The following code demonstrates one way to do this. def parse_some_stuff(stuff) match = StuffGrammar.parse(stuff) rescue Citrus::ParseError => e raise ArgumentError, "Invalid stuff on line %d, offset %d!" % [e.line_number, e.line_offset] end In addition to useful error objects, Citrus also includes a means of visualizing match trees in the console via `Match#dump`. This can help when determining which rules are generating which matches and how they are organized in the match tree. citrus-3.0.2/doc/syntax.markdown0000644000175000017500000001656613124403235015723 0ustar pravipravi# Syntax The most straightforward way to compose a Citrus grammar is to use Citrus' own custom grammar syntax. This syntax borrows heavily from Ruby, so it should already be familiar to Ruby programmers. ## Terminals Terminals may be represented by a string or a regular expression. Both follow the same rules as Ruby string and regular expression literals. 'abc' # match "abc" "abc\n" # match "abc\n" /abc/i # match "abc" in any case /\xFF/ # match "\xFF" Character classes and the dot (match anything) symbol are supported as well for compatibility with other parsing expression implementations. [a-z0-9] # match any lowercase letter or digit [\x00-\xFF] # match any octet . # match any single character, including new lines Also, strings may use backticks instead of quotes to indicate that they should match in a case-insensitive manner. `abc` # match "abc" in any case Besides case sensitivity, case-insensitive strings have the same behavior as double quoted strings. See [Terminal](api/classes/Citrus/Terminal.html) and [StringTerminal](api/classes/Citrus/StringTerminal.html) for more information. ## Repetition Quantifiers may be used after any expression to specify a number of times it must match. The universal form of a quantifier is `N*M` where `N` is the minimum and `M` is the maximum number of times the expression may match. 'abc'1*2 # match "abc" a minimum of one, maximum of two times 'abc'1* # match "abc" at least once 'abc'*2 # match "abc" a maximum of twice Additionally, the minimum and maximum may be omitted entirely to specify that an expression may match zero or more times. 'abc'* # match "abc" zero or more times The `+` and `?` operators are supported as well for the common cases of `1*` and `*1` respectively. 'abc'+ # match "abc" one or more times 'abc'? # match "abc" zero or one time See [Repeat](api/classes/Citrus/Repeat.html) for more information. ## Lookahead Both positive and negative lookahead are supported in Citrus. Use the `&` and `!` operators to indicate that an expression either should or should not match. In neither case is any input consumed. 'a' &'b' # match an "a" that is followed by a "b" 'a' !'b' # match an "a" that is not followed by a "b" !'a' . # match any character except for "a" A special form of lookahead is also supported which will match any character that does not match a given expression. ~'a' # match all characters until an "a" ~/xyz/ # match all characters until /xyz/ matches When using this operator (the tilde), at least one character must be consumed for the rule to succeed. See [AndPredicate](api/classes/Citrus/AndPredicate.html), [NotPredicate](api/classes/Citrus/NotPredicate.html), and [ButPredicate](api/classes/Citrus/ButPredicate.html) for more information. ## Sequences Sequences of expressions may be separated by a space to indicate that the rules should match in that order. 'a' 'b' 'c' # match "a", then "b", then "c" 'a' [0-9] # match "a", then a numeric digit See [Sequence](api/classes/Citrus/Sequence.html) for more information. ## Choices Ordered choice is indicated by a vertical bar that separates two expressions. When using choice, each expression is tried in order. When one matches, the rule returns the match immediately without trying the remaining rules. 'a' | 'b' # match "a" or "b" 'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c" It is important to note when using ordered choice that any operator binds more tightly than the vertical bar. A full chart of operators and their respective levels of precedence is below. See [Choice](api/classes/Citrus/Choice.html) for more information. ## Labels Match objects may be referred to by a different name than the rule that originally generated them. Labels are added by placing the label and a colon immediately preceding any expression. chars:/[a-z]+/ # the characters matched by the regular expression # may be referred to as "chars" in an extension # method ## Extensions Extensions may be specified using either "module" or "block" syntax. When using module syntax, specify the name of a module that is used to extend match objects in between less than and greater than symbols. [a-z0-9]5*9 # match a string that consists of any lower # cased letter or digit between 5 and 9 # times and extend the match with the # CouponCode module Additionally, extensions may be specified inline using curly braces. When using this method, the code inside the curly braces may be invoked by calling the `value` method on the match object. [0-9] { to_i } # match any digit and return its integer value when # calling the #value method on the match object Note that when using the inline block method you may also specify arguments in between vertical bars immediately following the opening curly brace, just like in Ruby blocks. ## Super When including a grammar inside another, all rules in the child that have the same name as a rule in the parent also have access to the `super` keyword to invoke the parent rule. grammar Number rule number [0-9]+ end end grammar FloatingPoint include Number rule number super ('.' super)? end end In the example above, the `FloatingPoint` grammar includes `Number`. Both have a rule named `number`, so `FloatingPoint#number` has access to `Number#number` by means of using `super`. See [Super](api/classes/Citrus/Super.html) for more information. ## Precedence The following table contains a list of all Citrus symbols and operators and their precedence. A higher precedence indicates tighter binding. Operator | Name | Precedence ------------------------- | ------------------------- | ---------- `''` | String (single quoted) | 7 `""` | String (double quoted) | 7 `` | String (case insensitive) | 7 `[]` | Character class | 7 `.` | Dot (any character) | 7 `//` | Regular expression | 7 `()` | Grouping | 7 `*` | Repetition (arbitrary) | 6 `+` | Repetition (one or more) | 6 `?` | Repetition (zero or one) | 6 `&` | And predicate | 5 `!` | Not predicate | 5 `~` | But predicate | 5 `<>` | Extension (module name) | 4 `{}` | Extension (literal) | 4 `:` | Label | 3 `e1 e2` | Sequence | 2 e1 | e2 | Ordered choice | 1 ## Grouping As is common in many programming languages, parentheses may be used to override the normal binding order of operators. In the following example parentheses are used to make the vertical bar between `'b'` and `'c'` bind tighter than the space between `'a'` and `'b'`. 'a' ('b' | 'c') # match "a", then "b" or "c" citrus-3.0.2/doc/example.markdown0000644000175000017500000001457713124403235016030 0ustar pravipravi# Example Below is an example of a simple grammar that is able to parse strings of integers separated by any amount of white space and a `+` symbol. grammar Addition rule additive number plus (additive | number) end rule number [0-9]+ space end rule plus '+' space end rule space [ \t]* end end Several things to note about the above example: * Grammar and rule declarations end with the `end` keyword * A sequence of rules is created by separating expressions with a space * Likewise, ordered choice is represented with a vertical bar * Parentheses may be used to override the natural binding order * Rules may refer to other rules in their own definitions simply by using the other rule's name * Any expression may be followed by a quantifier ## Interpretation The grammar above is able to parse simple mathematical expressions such as "1+2" and "1 + 2+3", but it does not have enough semantic information to be able to actually interpret these expressions. At this point, when the grammar parses a string it generates a tree of [Match](api/classes/Citrus/Match.html) objects. Each match is created by a rule and may itself be comprised of any number of submatches. Submatches are created whenever a rule contains another rule. For example, in the grammar above `number` matches a string of digits followed by white space. Thus, a match generated by this rule will contain two submatches. We can define a method inside a set of curly braces that will be used to extend a particular rule's matches. This works in similar fashion to using Ruby's blocks. Let's extend the `Addition` grammar using this technique. grammar Addition rule additive (number plus term:(additive | number)) { number.value + term.value } end rule number ([0-9]+ space) { to_i } end rule plus '+' space end rule space [ \t]* end end In this version of the grammar we have added two semantic blocks, one each for the `additive` and `number` rules. These blocks contain code that we can execute by calling `value` on match objects that result from those rules. It's easiest to explain what is going on here by starting with the lowest level block, which is defined within `number`. Inside this block we see a call to another method, namely `to_i`. When called in the context of a match object, methods that are not defined may be called on a match's internal string object via `method_missing`. Thus, the call to `to_i` should return the integer value of the match. Similarly, matches created by `additive` will also have a `value` method. Notice the use of the `term` label within the rule definition. This label allows the match that is created by the choice between `additive` and `number` to be retrieved using the `term` method. The value of an additive match is determined to be the values of its `number` and `term` matches added together using Ruby's addition operator. Since `additive` is the first rule defined in the grammar, any match that results from parsing a string with this grammar will have a `value` method that can be used to recursively calculate the collective value of the entire match tree. To give it a try, save the code for the `Addition` grammar in a file called addition.citrus. Next, assuming you have the Citrus [gem](https://rubygems.org/gems/citrus) installed, try the following sequence of commands in a terminal. $ irb > require 'citrus' => true > Citrus.load 'addition' => [Addition] > m = Addition.parse '1 + 2 + 3' => # m.value => 6 Congratulations! You just ran your first piece of Citrus code. One interesting thing to notice about the above sequence of commands is the return value of [Citrus#load](api/classes/Citrus.html#M000003). When you use `Citrus.load` to load a grammar file (and likewise [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate a raw string of grammar code), the return value is an array of all the grammars present in that file. Take a look at [examples/calc.citrus](http://github.com/mjackson/citrus/blob/master/examples/calc.citrus) for an example of a calculator that is able to parse and evaluate more complex mathematical expressions. ## Additional Methods If you need more than just a `value` method on your match object, you can attach additional methods as well. There are two ways to do this. The first lets you define additional methods inline in your semantic block. This block will be used to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682). Using the `Addition` example above, we might refactor the `additive` rule to look like this: rule additive (number plus term:(additive | number)) { def lhs number.value end def rhs term.value end def value lhs + rhs end } end Now, in addition to having a `value` method, matches that result from the `additive` rule will have a `lhs` and a `rhs` method as well. Although not particularly useful in this example, this technique can be useful when unit testing more complex rules. For example, using this method you might make the following assertions in a unit test: match = Addition.parse('1 + 4') assert_equal(1, match.lhs) assert_equal(4, match.rhs) assert_equal(5, match.value) If you would like to abstract away the code in a semantic block, simply create a separate Ruby module (in another file) that contains the extension methods you want and use the angle bracket notation to indicate that a rule should use that module when extending matches. To demonstrate this method with the above example, in a Ruby file you would define the following module. module Additive def lhs number.value end def rhs term.value end def value lhs + rhs end end Then, in your Citrus grammar file the rule definition would look like this: rule additive (number plus term:(additive | number)) end This method of defining extensions can help keep your grammar files cleaner. However, you do need to make sure that your extension modules are already loaded before using `Citrus.load` to load your grammar file. citrus-3.0.2/doc/license.markdown0000644000175000017500000000205313124403235016001 0ustar pravipravi# License Copyright 2010 Michael Jackson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. citrus-3.0.2/doc/index.markdown0000644000175000017500000000071213124403235015466 0ustar pravipraviCitrus is a compact and powerful parsing library for [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of the language with the simplicity and power of [parsing expressions](http://en.wikipedia.org/wiki/Parsing_expression_grammar). # Installation Via [RubyGems](http://rubygems.org/): $ gem install citrus From a local copy: $ git clone git://github.com/mjackson/citrus.git $ cd citrus $ rake package install citrus-3.0.2/doc/extras.markdown0000644000175000017500000000074413124403235015672 0ustar pravipravi# Extras Several files are included in the Citrus repository that make it easier to work with grammar files in various editors. ## TextMate To install the Citrus [TextMate](http://macromates.com/) bundle, simply double-click on the `Citrus.tmbundle` file in the `extras` directory. ## Vim To install the [Vim](http://www.vim.org/) scripts, copy the files in `extras/vim` to a directory in Vim's [runtimepath](http://vimdoc.sourceforge.net/htmldoc/options.html#\'runtimepath\'). citrus-3.0.2/doc/links.markdown0000644000175000017500000000120213124403235015472 0ustar pravipravi# Links The primary resource for all things to do with parsing expressions can be found on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat) at MIT. Also, a useful summary of parsing expression grammars can be found on [Wikipedia](http://en.wikipedia.org/wiki/Parsing_expression_grammar). Citrus draws inspiration from another Ruby library for writing parsing expression grammars, Treetop. While Citrus' syntax is similar to that of [Treetop](http://treetop.rubyforge.org), it's not identical. The link is included here for those who may wish to explore an alternative implementation. citrus-3.0.2/doc/background.markdown0000644000175000017500000000732513124403235016505 0ustar pravipravi# Background In order to be able to use Citrus effectively, you must first understand the difference between syntax and semantics. Syntax is a set of rules that govern the way letters and punctuation may be used in a language. For example, English syntax dictates that proper nouns should start with a capital letter and that sentences should end with a period. Semantics are the rules by which meaning may be derived in a language. For example, as you read a book you are able to make some sense of the particular way in which words on a page are combined to form thoughts and express ideas because you understand what the words themselves mean and you understand what they mean collectively. Computers use a similar process when interpreting code. First, the code must be parsed into recognizable symbols or tokens. These tokens may then be passed to an interpreter which is responsible for forming actual instructions from them. Citrus is a pure Ruby library that allows you to perform both lexical analysis and semantic interpretation quickly and easily. Using Citrus you can write powerful parsers that are simple to understand and easy to create and maintain. In Citrus, there are three main types of objects: rules, grammars, and matches. ## Rules A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching behavior on a string. There are two types of rules: terminals and non-terminals. Terminals can be either Ruby strings or regular expressions that specify some input to match. For example, a terminal created from the string "end" would match any sequence of the characters "e", "n", and "d", in that order. Terminals created from regular expressions may match any sequence of characters that can be generated from that expression. Non-terminals are rules that may contain other rules but do not themselves match directly on the input. For example, a Repeat is a non-terminal that may contain one other rule that will try and match a certain number of times. Several other types of non-terminals are available that will be discussed later. Rule objects may also have semantic information associated with them in the form of Ruby modules. Rules use these modules to extend the matches they create. ## Grammars A [Grammar](api/classes/Citrus/Grammar.html) is a container for rules. Usually the rules in a grammar collectively form a complete specification for some language, or a well-defined subset thereof. A Citrus grammar is really just a souped-up Ruby [module](http://ruby-doc.org/core/classes/Module.html). These modules may be included in other grammar modules in the same way that Ruby modules are normally used. This property allows you to divide a complex grammar into more manageable, reusable pieces that may be combined at runtime. Any rule with the same name as a rule in an included grammar may access that rule with a mechanism similar to Ruby's `super` keyword. ## Matches A [Match](api/classes/Citrus/Match.html) object represents a successful recognition of some piece of the input. Matches are created by rule objects during a parse. Matches are arranged in a tree structure where any match may contain any number of other matches. Each match contains information about its own subtree. The structure of the tree is determined by the way in which the rule that generated each match is used in the grammar. For example, a match that is created from a nonterminal rule that contains several other terminals will likewise contain several matches, one for each terminal. However, this is an implementation detail and should be relatively transparent to the user. Match objects may be extended with semantic information in the form of methods. These methods should provide various interpretations for the semantic value of a match. citrus-3.0.2/benchmark/0000755000175000017500000000000013124403235014000 5ustar pravipravicitrus-3.0.2/benchmark/seqpar.rb0000644000175000017500000000541613124403235015626 0ustar pravipravi$LOAD_PATH << File.expand_path('../../lib', __FILE__) # Benchmarking written by Bernard Lambeau and Jason Garber of the Treetop # project. # # To test your optimizations: # 1. Run ruby seqpar.rb # 2. cp after.dat before.dat # 3. Make your modifications to the Citrus code # 4. Run ruby seqpar.rb # 5. Run gnuplot seqpar.gnuplot require 'citrus' require 'benchmark' srand(47562) # So it runs the same each time class Array def sum inject(0) {|m, x| m + x } end def mean sum / size end end class SeqParBenchmark OPERATORS = ["seq", "fit", "art" * 5, "par", "sequence"] def initialize @where = File.expand_path('..', __FILE__) Citrus.load(File.join(@where, 'seqpar')) @grammar = SeqPar end # Checks the grammar def check [ "Task", "seq Task end", "par Task end", "seq Task Task end", "par Task Task end", "par seq Task end Task end", "par seq seq Task end end Task end", "seq Task par seq Task end Task end Task end" ].each do |input| @grammar.parse(input) end end # Generates an input text def generate(depth=0) return "Task" if depth > 7 return "seq #{generate(depth + 1)} end" if depth == 0 which = rand(OPERATORS.length) case which when 0 "Task" else raise unless OPERATORS[which] buffer = "#{OPERATORS[which]} " 0.upto(rand(4) + 1) do buffer << generate(depth + 1) << " " end buffer << "end" buffer end end # Launches benchmarking def benchmark number_by_size = Hash.new {|h,k| h[k] = 0} time_by_size = Hash.new {|h,k| h[k] = 0} 0.upto(250) do |i| input = generate length = input.length puts "Launching #{i}: #{input.length}" # puts input tms = Benchmark.measure { @grammar.parse(input) } number_by_size[length] += 1 time_by_size[length] += tms.total * 1000 end # puts number_by_size.inspect # puts time_by_size.inspect File.open(File.join(@where, 'after.dat'), 'w') do |dat| number_by_size.keys.sort.each do |size| dat << "#{size} #{(time_by_size[size]/number_by_size[size]).truncate}\n" end end if File.exists?(File.join(@where, 'before.dat')) before = {} performance_increases = [] File.foreach(File.join(@where, 'before.dat')) do |line| size, time = line.split(' ') before[size] = time end File.foreach(File.join(@where, 'after.dat')) do |line| size, time = line.split(' ') performance_increases << (before[size].to_f - time.to_f) / before[size].to_f unless time == "0" || before[size] == "0" end puts "Average performance increase: #{(performance_increases.mean * 100 * 10).round / 10.0}%" end end end SeqParBenchmark.new.benchmark citrus-3.0.2/benchmark/seqpar.citrus0000644000175000017500000000050213124403235016523 0ustar pravipravigrammar SeqPar rule statement 'par ' (statement ' ')+ 'end' | 'sequence' ' ' (statement ' ')+ 'end' | 'seq' ' ' (statement ' ')+ 'end' | ('fit' [\s] (statement ' ')+ 'end') { def foo "foo" end } | 'art'+ [ ] (statement ' ')+ 'end' | [A-Z] [a-zA-z0-9]* end end citrus-3.0.2/benchmark/seqpar.gnuplot0000644000175000017500000000053013124403235016703 0ustar pravipravif1(x) = a*x a = 0.5 fit f1(x) 'before.dat' using 1:2 via a f2(x) = b*x b = 0.5 fit f2(x) 'after.dat' using 1:2 via b set xlabel "Length of input" set ylabel "CPU time to parse" plot a*x title 'a*x (Before)',\ b*x title 'b*x (After)',\ "before.dat" using 1:2 title 'Before', \ "after.dat" using 1:2 title 'After' citrus-3.0.2/CHANGES0000644000175000017500000001001113124403235013032 0ustar pravipravi= 3.0.1 / 2014-03-14 * Fixed bad 3.0.0 release. = 3.0.0 / 2014-03-14 * Moved Object#grammar to citrus/core_ext.rb. Citrus no longer installs core extensions by default. Use "require 'citrus/core_ext.rb'" instead of "require 'citrus'" to keep the previous behavior. * Removed Match#method_missing, added #capture(name) and #captures(name) Match#method_missing is unsafe as illustrated in Github issue #41. In particular, it makes composing a grammar with aribitrary gems unsafe (e.g. when the latter make core extensions), leads to unexpected results with labels match existing Kernel methods (e.g. `p`), and prevents Match from getting new methods in a backward compatible way. This commit therefore removes it. In Citrus 2.x, method_missing allowed rule productions to denote captured matches by label name: rule pair (foo ':' bar) { [foo.value, bar.value] } end Also, it allowed invoking String operators on the Match's text: rule int [0-9]+ { to_i } end Those two scenarios no longer work out of the box in Citrus 3.0. You must use capture(label) for the former, and to_str for the latter: rule pair (foo ':' bar) { [capture(:foo).value, capture(:bar).value] } end rule int [0-9]+ { to_str.to_i } end Match#captures now accepts an optional label name as first argument and returns the corresponding array of matches for that label (useful in case the label belongs to a repetition). = 2.5.0 / 2014-03-13 * Inputs may be generated from many different sources, including Pathname and IO objects (thanks blambeau). * Matches keep track of their offset in the original source (thanks blambeau). * Citrus.load no longer raises Citrus::LoadError for files that can't be found or are not readable. Users must rescue Errno::ENOENT instead, for example. * Removed a few ruby warnings (thanks tbuehlmann) = 2.4.1 / 2011-11-04 * Fixed a bug that prevented rule names from starting with "super". * Several minor bug fixes. = 2.4.0 / 2011-05-11 * Fixed a bug that prevented parsing nested blocks correctly (issue #21). * Added URI example. * Moved example grammars inside lib/citrus/grammars and added lib/citrus/grammars.rb for easily requiring Citrus example grammars. = 2.3.7 / 2011-02-20 * Fixed a bug that prevented forward slashes from being used inside character class literals. * Added email address example. = 2.3.6 / 2011-02-19 * Fixed a bug that prevented memoization from advancing the input's pointer properly (thanks joachimm). * Several additions to the TextMate bundle (thanks joachimm). = 2.3.5 / 2011-02-07 * Fixed a bug that prevented Match objects from being printed properly using Kernel#puts (thanks joachimm). * Fixed a bug that prevented using rules with names that begin with "end" (thanks Mark Wilden). * Citrus#require accepts relative file paths, in addition to absolute ones. * Simplified/cleaned up some example files. = 2.3.4 / 2011-01-17 * Added CHANGES file. = 2.3.3 / 2011-01-17 * Added self to Match#captures hash. This means that a Match may retrieve a reference to itself by using its own label, proxy name, or index 0 in the hash. * Match#captures returns an empty array for unknown Symbol keys, coerces String keys to Symbols, and returns nil for unknown Numeric keys. * Moved Citrus::VERSION to its own file. * Citrus::LoadError is raised when Citrus is unable to load a file from the file system because it cannot be found or it is not readable. * Citrus::SyntaxError is raised when Citrus::File is unable to parse some Citrus syntax. * Added Citrus.require for requiring .citrus grammar files in a similar way to Ruby's Kernel.require. Also, overloaded the require operator in Citrus grammar files to failover to Citrus.require when Kernel.require raises a LoadError. * Improved UTF-8 support. citrus-3.0.2/Rakefile0000644000175000017500000000311013124403235013506 0ustar pravipravirequire 'rake/clean' require 'rake/testtask' task :default => :test # TESTS ####################################################################### Rake::TestTask.new(:test) do |t| t.test_files = FileList['test/**/*_test.rb'] end # DOCS ######################################################################## desc "Generate API documentation" task :api => 'lib/citrus.rb' do |t| output_dir = ENV['OUTPUT_DIR'] || 'api' rm_rf output_dir sh((<<-SH).gsub(/\s+/, ' ').strip) hanna --op #{output_dir} --promiscuous --charset utf8 --fmt html --inline-source --line-numbers --accessor option_accessor=RW --main Citrus --title 'Citrus API Documentation' #{t.prerequisites.join(' ')} SH end CLEAN.include 'api' # PACKAGING & INSTALLATION #################################################### if defined?(Gem) $spec = eval("#{File.read('citrus.gemspec')}") directory 'dist' def package(ext='') "dist/#{$spec.name}-#{$spec.version}" + ext end file package('.gem') => %w< dist > + $spec.files do |f| sh "gem build citrus.gemspec" mv File.basename(f.name), f.name end file package('.tar.gz') => %w< dist > + $spec.files do |f| sh "git archive --format=tar HEAD | gzip > #{f.name}" end desc "Build packages" task :package => %w< .gem .tar.gz >.map {|e| package(e) } desc "Build and install as local gem" task :install => package('.gem') do |t| sh "gem install #{package('.gem')}" end desc "Upload gem to rubygems.org" task :release => package('.gem') do |t| sh "gem push #{package('.gem')}" end end citrus-3.0.2/test/0000755000175000017500000000000013124403235013025 5ustar pravipravicitrus-3.0.2/test/and_predicate_test.rb0000644000175000017500000000171213124403235017174 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class AndPredicateTest < Test::Unit::TestCase def test_terminal? rule = AndPredicate.new assert_equal(false, rule.terminal?) end def test_exec rule = AndPredicate.new('abc') events = rule.exec(Input.new('abc')) assert_equal([rule, CLOSE, 0], events) end def test_exec_miss rule = AndPredicate.new('def') events = rule.exec(Input.new('abc')) assert_equal([], events) end def test_consumption rule = AndPredicate.new(Sequence.new(['a', 'b', 'c'])) input = Input.new('abc') events = rule.exec(input) assert_equal(0, input.pos) input = Input.new('def') events = rule.exec(input) assert_equal(0, input.pos) end def test_to_s rule = AndPredicate.new('a') assert_equal('&"a"', rule.to_s) end def test_to_s_with_label rule = AndPredicate.new('a') rule.label = 'a_label' assert_equal('a_label:&"a"', rule.to_s) end end citrus-3.0.2/test/string_terminal_test.rb0000644000175000017500000000261613124403235017617 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class StringTerminalTest < Test::Unit::TestCase def test_terminal? rule = StringTerminal.new assert(rule.terminal?) end def test_eql? rule = StringTerminal.new('abc') assert_equal('abc', rule) end def test_exec rule = StringTerminal.new('abc') events = rule.exec(Input.new('abc')) assert_equal([rule, CLOSE, 3], events) end def test_exec_miss rule = StringTerminal.new('abc') events = rule.exec(Input.new('def')) assert_equal([], events) end def test_exec_short rule = StringTerminal.new('abc') events = rule.exec(Input.new('ab')) assert_equal([], events) end def test_exec_long rule = StringTerminal.new('abc') events = rule.exec(Input.new('abcd')) assert_equal([rule, CLOSE, 3], events) end def test_exec_case_insensitive rule = StringTerminal.new('abc', Regexp::IGNORECASE) events = rule.exec(Input.new('abc')) assert_equal([rule, CLOSE, 3], events) events = rule.exec(Input.new('ABC')) assert_equal([rule, CLOSE, 3], events) events = rule.exec(Input.new('Abc')) assert_equal([rule, CLOSE, 3], events) end def test_to_s rule = StringTerminal.new('abc') assert_equal('"abc"', rule.to_s) end def test_to_s_case_insensitive rule = StringTerminal.new('abc', Regexp::IGNORECASE) assert_equal('`abc`', rule.to_s) end end citrus-3.0.2/test/multibyte_test.rb0000644000175000017500000000251413124403235016431 0ustar pravipravi# encoding: UTF-8 require File.expand_path('../helper', __FILE__) class MultibyteTest < Test::Unit::TestCase grammar :Multibyte do rule :string do "ä" end rule :regexp do /(ä)+/ end rule :character_class do /[ä]+/ end rule :dot do DOT end end def test_multibyte_string m = Multibyte.parse("ä", :root => :string) assert(m) end def test_multibyte_regexp m = Multibyte.parse("äää", :root => :regexp) assert(m) end def test_multibyte_character_class m = Multibyte.parse("äää", :root => :character_class) assert(m) end def test_multibyte_dot m = Multibyte.parse("ä", :root => :dot) assert(m) end Citrus.eval(<<-CITRUS) grammar Multibyte2 rule string "ä" end rule regexp /(ä)+/ end rule character_class [ä]+ end rule dot .+ end end CITRUS def test_multibyte2_string m = Multibyte2.parse("ä", :root => :string) assert(m) end def test_multibyte2_regexp m = Multibyte2.parse("äää", :root => :regexp) assert(m) end def test_multibyte2_character_class m = Multibyte2.parse("äää", :root => :character_class) assert(m) end def test_multibyte2_dot m = Multibyte2.parse("äää", :root => :dot) assert(m) end end citrus-3.0.2/test/parse_error_test.rb0000644000175000017500000000227013124403235016735 0ustar pravipravi# encoding: UTF-8 require File.expand_path('../helper', __FILE__) class ParseErrorTest < Test::Unit::TestCase Sentence = Grammar.new do include Words rule :sentence do all(:capital_word, one_or_more([ :space, :word ]), :period) end rule :capital_word do all(/[A-Z]/, zero_or_more(:alpha)) end rule :space do one_or_more(any(" ", "\n", "\r\n")) end rule :period, '.' end def test_basic begin TestGrammar.parse('#') rescue ParseError => e assert_equal(0, e.offset) assert_equal('#', e.line) assert_equal(1, e.line_number) assert_equal(0, e.line_offset) end end def test_single_line begin Sentence.parse('Once upon ä time.') rescue ParseError => e assert_equal(10, e.offset) assert_equal('Once upon ä time.', e.line) assert_equal(1, e.line_number) assert_equal(10, e.line_offset) end end def test_multi_line begin Sentence.parse("Once\nupon a\r\ntim3.") rescue ParseError => e assert_equal(16, e.offset) assert_equal('tim3.', e.line) assert_equal(3, e.line_number) assert_equal(3, e.line_offset) end end end citrus-3.0.2/test/alias_test.rb0000644000175000017500000000211713124403235015503 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class AliasTest < Test::Unit::TestCase def test_terminal? rule = Alias.new assert_equal(false, rule.terminal?) end def test_exec grammar = Grammar.new { rule :a, :b rule :b, 'abc' } rule_a = grammar.rule(:a) events = rule_a.exec(Input.new('abc')) assert_equal([rule_a, CLOSE, 3], events) end def test_exec_miss grammar = Grammar.new { rule :a, :b rule :b, 'abc' } rule = grammar.rule(:a) events = rule.exec(Input.new('def')) assert_equal([], events) end def test_exec_included grammar1 = Grammar.new { rule :a, 'abc' } grammar2 = Grammar.new { include grammar1 rule :b, :a } rule_b2 = grammar2.rule(:b) events = rule_b2.exec(Input.new('abc')) assert_equal([rule_b2, CLOSE, 3], events) end def test_to_s rule = Alias.new(:alpha) assert_equal('alpha', rule.to_s) end def test_to_s_with_label rule = Alias.new(:alpha) rule.label = 'a_label' assert_equal('a_label:alpha', rule.to_s) end end citrus-3.0.2/test/memoized_input_test.rb0000644000175000017500000000220113124403235017434 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class MemoizedInputTest < Test::Unit::TestCase def test_memoized? assert MemoizedInput.new('').memoized? end grammar :LetterA do rule :top do any(:three_as, :two_as, :one_a) end rule :three_as do rep(:one_a, 3, 3) end rule :two_as do rep(:one_a, 2, 2) end rule :one_a do "a" end end def test_cache_hits1 input = MemoizedInput.new('a') input.exec(LetterA.rule(:top)) assert_equal(3, input.cache_hits) end def test_cache_hits2 input = MemoizedInput.new('aa') input.exec(LetterA.rule(:top)) assert_equal(2, input.cache_hits) end def test_cache_hits3 input = MemoizedInput.new('aaa') input.exec(LetterA.rule(:top)) assert_equal(0, input.cache_hits) end grammar :LettersABC do rule :top do any(all(:a,:b,:c), all(:b,:a,:c), all(:b,:c,:a)) end rule :a do "a" end rule :b do "b" end rule :c do "c" end end def test_memoization match = LettersABC.parse('bca',{:memoize=>true}) assert_equal('bca',match) end end citrus-3.0.2/test/not_predicate_test.rb0000644000175000017500000000171213124403235017232 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class NotPredicateTest < Test::Unit::TestCase def test_terminal? rule = NotPredicate.new assert_equal(false, rule.terminal?) end def test_exec rule = NotPredicate.new('abc') events = rule.exec(Input.new('def')) assert_equal([rule, CLOSE, 0], events) end def test_exec_miss rule = NotPredicate.new('abc') events = rule.exec(Input.new('abc')) assert_equal([], events) end def test_consumption rule = NotPredicate.new(Sequence.new(['a', 'b', 'c'])) input = Input.new('abc') events = rule.exec(input) assert_equal(0, input.pos) input = Input.new('def') events = rule.exec(input) assert_equal(0, input.pos) end def test_to_s rule = NotPredicate.new('a') assert_equal('!"a"', rule.to_s) end def test_to_s_with_label rule = NotPredicate.new('a') rule.label = 'a_label' assert_equal('a_label:!"a"', rule.to_s) end end citrus-3.0.2/test/terminal_test.rb0000644000175000017500000000137613124403235016233 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class TerminalTest < Test::Unit::TestCase def test_terminal? rule = Terminal.new assert(rule.terminal?) end def test_eql? rule = Terminal.new(/abc/i) assert_equal(rule, /abc/i) end def test_exec rule = Terminal.new(/\d+/) events = rule.exec(Input.new('123')) assert_equal([rule, CLOSE, 3], events) end def test_exec_long rule = Terminal.new(/\d+/) events = rule.exec(Input.new('123 456')) assert_equal([rule, CLOSE, 3], events) end def test_exec_miss rule = Terminal.new(/\d+/) events = rule.exec(Input.new(' 123')) assert_equal([], events) end def test_to_s rule = Terminal.new(/\d+/) assert_equal('/\\d+/', rule.to_s) end end citrus-3.0.2/test/sequence_test.rb0000644000175000017500000000234413124403235016224 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class SequenceTest < Test::Unit::TestCase def test_terminal? rule = Sequence.new assert_equal(false, rule.terminal?) end def test_exec a = Rule.for('a') b = Rule.for('b') c = Rule.for('c') rule = Sequence.new([ a, b, c ]) events = rule.exec(Input.new('')) assert_equal([], events) expected_events = [ rule, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3 ] events = rule.exec(Input.new('abc')) assert_equal(expected_events, events) end def test_to_s rule = Sequence.new(%w) assert_equal('"a" "b"', rule.to_s) end def test_to_s_with_label rule = Sequence.new(%w) rule.label = 'a_label' assert_equal('a_label:("a" "b")', rule.to_s) end def test_to_embedded_s rule1 = Sequence.new(%w) rule2 = Sequence.new(%w) rule = Sequence.new([rule1, rule2]) assert_equal('("a" "b") ("c" "d")', rule.to_s) end def test_to_embedded_s_with_label rule1 = Sequence.new(%w) rule2 = Sequence.new(%w) rule2.label = 'a_label' rule = Sequence.new([rule1, rule2]) assert_equal('("a" "b") a_label:("c" "d")', rule.to_s) end end citrus-3.0.2/test/repeat_test.rb0000644000175000017500000000543413124403235015677 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class RepeatTest < Test::Unit::TestCase def test_terminal? rule = Repeat.new assert_equal(false, rule.terminal?) end def test_exec_zero_or_one abc = Rule.for('abc') rule = Repeat.new(abc, 0, 1) events = rule.exec(Input.new('')) assert_equal([rule, CLOSE, 0], events) events = rule.exec(Input.new('abc')) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) events = rule.exec(Input.new('abc' * 3)) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) end def test_exec_zero_or_more abc = Rule.for('abc') rule = Repeat.new(abc, 0, Infinity) events = rule.exec(Input.new('')) assert_equal([rule, CLOSE, 0], events) events = rule.exec(Input.new('abc')) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) expected_events = [ rule, abc, CLOSE, 3, abc, CLOSE, 3, abc, CLOSE, 3, CLOSE, 9 ] events = rule.exec(Input.new('abc' * 3)) assert_equal(expected_events, events) end def test_exec_one_or_more abc = Rule.for('abc') rule = Repeat.new(abc, 1, Infinity) events = rule.exec(Input.new('')) assert_equal([], events) events = rule.exec(Input.new('abc')) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) expected_events = [ rule, abc, CLOSE, 3, abc, CLOSE, 3, abc, CLOSE, 3, CLOSE, 9 ] events = rule.exec(Input.new('abc' * 3)) assert_equal(expected_events, events) end def test_exec_one abc = Rule.for('abc') rule = Repeat.new(abc, 1, 1) events = rule.exec(Input.new('')) assert_equal([], events) events = rule.exec(Input.new('abc')) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) events = rule.exec(Input.new('abc' * 3)) assert_equal([rule, abc, CLOSE, 3, CLOSE, 3], events) end def test_operator rule = Repeat.new('', 1, 2) assert_equal('1*2', rule.operator) end def test_operator_empty rule = Repeat.new('', 0, 0) assert_equal('', rule.operator) end def test_operator_asterisk rule = Repeat.new('', 0, Infinity) assert_equal('*', rule.operator) end def test_operator_question_mark rule = Repeat.new('', 0, 1) assert_equal('?', rule.operator) end def test_operator_plus rule = Repeat.new('', 1, Infinity) assert_equal('+', rule.operator) end def test_to_s rule = Repeat.new(/a/, 1, 2) assert_equal('/a/1*2', rule.to_s) end def test_to_s_asterisk rule = Repeat.new('a', 0, Infinity) assert_equal('"a"*', rule.to_s) end def test_to_s_question_mark rule = Repeat.new('a', 0, 1) assert_equal('"a"?', rule.to_s) end def test_to_s_plus rule = Repeat.new('a', 1, Infinity) assert_equal('"a"+', rule.to_s) end end citrus-3.0.2/test/grammars/0000755000175000017500000000000013124403235014636 5ustar pravipravicitrus-3.0.2/test/grammars/ipaddress_test.rb0000644000175000017500000000057413124403235020206 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'ipaddress' class IPAddressTest < Test::Unit::TestCase def test_v4 match = IPAddress.parse('1.2.3.4') assert(match) assert_equal(4, match.version) end def test_v6 match = IPAddress.parse('1:2:3:4::') assert(match) assert_equal(6, match.version) end end citrus-3.0.2/test/grammars/uri_test.rb0000644000175000017500000000221013124403235017014 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'uri' class UniformResourceIdentifierTest < Test::Unit::TestCase U = UniformResourceIdentifier def test_uri match = U.parse('http://www.example.com') assert(match) end def test_uri_with_query_string match = U.parse('http://www.example.com/?q=some+query') assert(match) end def test_authority match = U.parse('michael@', :root => :authority) assert(match) end def test_host match = U.parse('127.0.0.1', :root => :host) assert(match) match = U.parse('[12AD:34FC:A453:1922::]', :root => :host) assert(match) end def test_userinfo match = U.parse('michael', :root => :userinfo) assert(match) assert_raise(Citrus::ParseError) do U.parse('michael@', :root => :userinfo) end end def test_ipliteral match = U.parse('[12AD:34FC:A453:1922::]', :root => :'IP-literal') assert(match) end def test_ipvfuture match = U.parse('v1.123:456:789', :root => :IPvFuture) assert(match) match = U.parse('v5A.ABCD:1234', :root => :IPvFuture) assert(match) end end citrus-3.0.2/test/grammars/calc_test.rb0000644000175000017500000000350413124403235017126 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'calc' class CalcTest < Test::Unit::TestCase # A helper method that tests the successful parsing and evaluation of the # given mathematical expression. def do_test(expr) match = ::Calc.parse(expr) assert(match) assert_equal(expr, match) assert_equal(expr.length, match.length) assert_equal(eval(expr), match.value) end def test_int do_test('3') end def test_float do_test('1.5') end def test_addition do_test('1+2') end def test_addition_multi do_test('1+2+3') end def test_addition_float do_test('1.5+3') end def test_subtraction do_test('3-2') end def test_subtraction_float do_test('4.5-3') end def test_multiplication do_test('2*5') end def test_multiplication_float do_test('1.5*3') end def test_division do_test('20/5') end def test_division_float do_test('4.5/3') end def test_complex do_test('7*4+3.5*(4.5/3)') end def test_complex_spaced do_test('7 * 4 + 3.5 * (4.5 / 3)') end def test_complex_with_underscores do_test('(12_000 / 3) * 2.5') end def test_modulo do_test('3 % 2 + 4') end def test_exponent do_test('2**9') end def test_exponent_float do_test('2**2.2') end def test_negative_exponent do_test('2**-3') end def test_exponent_exponent do_test('2**2**2') end def test_exponent_group do_test('2**(3+1)') end def test_negative do_test('-5') end def test_double_negative do_test('--5') end def test_complement do_test('~4') end def test_double_complement do_test('~~4') end def test_mixed_unary do_test('~-4') end def test_complex_with_negatives do_test('4 * -7 / (8.0 + 1_2)**2') end end citrus-3.0.2/test/grammars/email_test.rb0000644000175000017500000001112413124403235017310 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'email' class EmailAddressTest < Test::Unit::TestCase def test_addr_spec_valid addresses = %w[ l3tt3rsAndNumb3rs@domain.com has-dash@domain.com hasApostrophe.o'leary@domain.org uncommonTLD@domain.museum uncommonTLD@domain.travel uncommonTLD@domain.mobi countryCodeTLD@domain.uk countryCodeTLD@domain.rw lettersInDomain@911.com underscore_inLocal@domain.net IPInsteadOfDomain@127.0.0.1 subdomain@sub.domain.com local@dash-inDomain.com dot.inLocal@foo.com a@singleLetterLocal.org singleLetterDomain@x.org &*=?^+{}'~@validCharsInLocal.net foor@bar.newTLD ] addresses.each do |address| match = EmailAddress.parse(address) assert(match) assert_equal(address, match) end end # NO-WS-CTL = %d1-8 / ; US-ASCII control characters # %d11 / ; that do not include the # %d12 / ; carriage return, line feed, # %d14-31 / ; and white space characters # %d127 def test_no_ws_ctl chars = chars_no_ws_ctl chars.each do |c| match = EmailAddress.parse(c, :root => :'NO-WS-CTL') assert(match) assert_equal(c, match) end end # quoted-pair = ("\" text) / obs-qp def test_quoted_pair chars = chars_quoted_pair chars.each do |c| match = EmailAddress.parse(c, :root => :'quoted-pair') assert(match) assert_equal(c, match) end end # atext = ALPHA / DIGIT / ; Printable US-ASCII # "!" / "#" / ; characters not including # "$" / "%" / ; specials. Used for atoms. # "&" / "'" / # "*" / "+" / # "-" / "/" / # "=" / "?" / # "^" / "_" / # "`" / "{" / # "|" / "}" / # "~" def test_atext chars = ('A'..'Z').to_a chars += ('a'..'z').to_a chars += ('0'..'9').to_a chars.push(*%w[! # $ % & ' * + - / = ? ^ _ ` { | } ~]) chars.each do |c| match = EmailAddress.parse(c, :root => :atext) assert(match) assert_equal(c, match) end end # qtext = %d33 / ; Printable US-ASCII # %d35-91 / ; characters not including # %d93-126 / ; "\" or the quote character # obs-qtext def test_qtext chars = ["\x21"] chars += ("\x23".."\x5B").to_a chars += ("\x5D".."\x7E").to_a # obs-qtext chars += chars_obs_no_ws_ctl chars.each do |c| match = EmailAddress.parse(c, :root => :qtext) assert(match) assert_equal(c, match) end end # dtext = %d33-90 / ; Printable US-ASCII # %d94-126 / ; characters not including # obs-dtext ; "[", "]", or "\" def test_dtext chars = ("\x21".."\x5A").to_a chars += ("\x5E".."\x7E").to_a # obs-dtext chars += chars_obs_no_ws_ctl chars += chars_quoted_pair chars.each do |c| match = EmailAddress.parse(c, :root => :dtext) assert(match) assert_equal(c, match) end end # text = %d1-9 / ; Characters excluding CR # %d11 / ; and LF # %d12 / # %d14-127 def test_text chars = chars_text chars.each do |c| match = EmailAddress.parse(c, :root => :text) assert(match) assert_equal(c, match) end end # [\x01-\x08\x0B\x0C\x0E-\x1F\x7F] def chars_no_ws_ctl chars = ("\x01".."\x08").to_a chars << "\x0B" chars << "\x0C" chars += ("\x0E".."\x1F").to_a chars << "\x7F" chars end # [\x01-\x09\x0B\x0C\x0E-\x7F] def chars_text chars = ("\x01".."\x09").to_a chars << "\x0B" chars << "\x0C" chars += ("\x0E".."\x7F").to_a chars end # [\x01-\x08\x0B\x0C\x0E-\x1F\x7F] def chars_obs_no_ws_ctl chars_no_ws_ctl end # ("\\" text) | obs-qp def chars_quoted_pair chars = chars_text.map {|c| "\\" + c } chars += chars_obs_qp chars end # "\\" ("\x00" | obs-NO-WS-CTL | "\n" | "\r") def chars_obs_qp chars = ["\x00"] chars += chars_obs_no_ws_ctl chars << "\n" chars << "\r" chars.map {|c| "\\" + c } end end citrus-3.0.2/test/grammars/ipv6address_test.rb0000644000175000017500000000177113124403235020462 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'ipv6address' class IPv6AddressTest < Test::Unit::TestCase def test_hexdig match = IPv6Address.parse('0', :root => :HEXDIG) assert(match) match = IPv6Address.parse('A', :root => :HEXDIG) assert(match) end def test_1 match = IPv6Address.parse('1:2:3:4:5:6:7:8') assert(match) assert_equal(6, match.version) end def test_2 match = IPv6Address.parse('12AD:34FC:A453:1922::') assert(match) assert_equal(6, match.version) end def test_3 match = IPv6Address.parse('12AD::34FC') assert(match) assert_equal(6, match.version) end def test_4 match = IPv6Address.parse('12AD::') assert(match) assert_equal(6, match.version) end def test_5 match = IPv6Address.parse('::') assert(match) assert_equal(6, match.version) end def test_invalid assert_raise Citrus::ParseError do IPv6Address.parse('1:2') end end end citrus-3.0.2/test/grammars/ipv4address_test.rb0000644000175000017500000000164613124403235020461 0ustar pravipravirequire File.expand_path('../../helper', __FILE__) require 'citrus/grammars' Citrus.require 'ipv4address' class IPv4AddressTest < Test::Unit::TestCase def test_dec_octet match = IPv4Address.parse('0', :root => :'dec-octet') assert(match) match = IPv4Address.parse('255', :root => :'dec-octet') assert(match) end def test_1 match = IPv4Address.parse('0.0.0.0') assert(match) assert_equal(4, match.version) end def test_2 match = IPv4Address.parse('255.255.255.255') assert(match) assert_equal(4, match.version) end def test_invalid assert_raise Citrus::ParseError do IPv4Address.parse('255.255.255.256') end end def test_invalid_short assert_raise Citrus::ParseError do IPv4Address.parse('255.255.255') end end def test_invalid_long assert_raise Citrus::ParseError do IPv4Address.parse('255.255.255.255.255') end end end citrus-3.0.2/test/input_test.rb0000644000175000017500000000456713124403235015564 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class InputTest < Test::Unit::TestCase def test_new # to_str assert_equal('abc', Input.new('abc').string) # read selftext = ::File.read(__FILE__) ::File.open(__FILE__, 'r') do |io| assert_equal(selftext, Input.new(io).string) end # to_path path = Struct.new(:to_path).new(__FILE__) assert_equal(selftext, Input.new(path).string) end def test_memoized? assert_equal(false, Input.new('').memoized?) end def test_offsets_new input = Input.new("abc\ndef\nghi") assert_equal(0, input.line_offset) assert_equal(0, input.line_index) assert_equal(1, input.line_number) assert_equal("abc\n", input.line) end def test_offsets_advanced input = Input.new("abc\ndef\nghi") input.pos = 6 assert_equal(2, input.line_offset) assert_equal(1, input.line_index) assert_equal(2, input.line_number) assert_equal("def\n", input.line) end def test_exec a = Rule.for('a') b = Rule.for('b') c = Rule.for('c') s = Rule.for([ a, b, c ]) r = Repeat.new(s, 0, Infinity) input = Input.new("abcabcabc") events = input.exec(r) expected_events = [ r, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, CLOSE, 9 ] assert_equal(expected_events, events) end def test_exec2 a = Rule.for('a') b = Rule.for('b') c = Choice.new([ a, b ]) r = Repeat.new(c, 0, Infinity) s = Rule.for([ a, r ]) input = Input.new('abbababba') events = input.exec(s) expected_events = [ s, a, CLOSE, 1, r, c, b, CLOSE, 1, CLOSE, 1, c, b, CLOSE, 1, CLOSE, 1, c, a, CLOSE, 1, CLOSE, 1, c, b, CLOSE, 1, CLOSE, 1, c, a, CLOSE, 1, CLOSE, 1, c, b, CLOSE, 1, CLOSE, 1, c, b, CLOSE, 1, CLOSE, 1, c, a, CLOSE, 1, CLOSE, 1, CLOSE, 8, CLOSE, 9 ] assert_equal(expected_events, events) end end citrus-3.0.2/test/grammar_test.rb0000644000175000017500000000677613124403235016057 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class GrammarTest < Test::Unit::TestCase def test_new grammar = Grammar.new assert_kind_of(Module, grammar) assert(grammar.include?(Grammar)) end def test_name assert_equal("Test::Unit::TestCase::TestGrammar", TestGrammar.name) end def test_no_name grammar = Grammar.new assert_equal('', grammar.name) end def test_rule_names assert_equal([:alpha, :num, :alphanum], TestGrammar.rule_names) end def test_has_name assert(TestGrammar.has_rule?('alpha')) assert(TestGrammar.has_rule?(:alpha)) end def test_doesnt_have_name assert_equal(false, TestGrammar.has_rule?(:value)) end def test_parse_fixed_width grammar = Grammar.new { rule(:abc) { 'abc' } } match = grammar.parse('abc') assert(match) assert_equal('abc', match) assert_equal(3, match.length) end def test_parse_expression grammar = Grammar.new { rule(:alpha) { /[a-z]+/i } } match = grammar.parse('abc') assert(match) assert_equal('abc', match) assert_equal(3, match.length) end def test_parse_sequence grammar = Grammar.new { rule(:num) { all(1, 2, 3) } } match = grammar.parse('123') assert(match) assert_equal('123', match) assert_equal(3, match.length) end def test_parse_sequence_long grammar = Grammar.new { rule(:num) { all(1, 2, 3) } } match = grammar.parse('1234', :consume => false) assert(match) assert_equal('123', match) assert_equal(3, match.length) end def test_parse_sequence_short grammar = Grammar.new { rule(:num) { all(1, 2, 3) } } assert_raise ParseError do grammar.parse('12') end end def test_parse_choice grammar = Grammar.new { rule(:alphanum) { any(/[a-z]/i, 0..9) } } match = grammar.parse('a') assert(match) assert_equal('a', match) assert_equal(1, match.length) match = grammar.parse('1') assert(match) assert_equal('1', match) assert_equal(1, match.length) end def test_parse_choice_miss grammar = Grammar.new { rule(:alphanum) { any(/[a-z]/, 0..9) } } assert_raise ParseError do grammar.parse('A') end end def test_parse_recurs grammar = Grammar.new { rule(:paren) { any(['(', :paren, ')'], /[a-z]/) } } match = grammar.parse('a') assert(match) assert_equal('a', match) assert_equal(1, match.length) match = grammar.parse('((a))') assert(match) assert_equal('((a))', match) assert_equal(5, match.length) n = 100 str = ('(' * n) + 'a' + (')' * n) match = grammar.parse(str) assert(match) assert_equal(str, match) assert_equal(str.length, match.length) end def test_parse_file grammar = Grammar.new { rule("words"){ rep(any(" ", /[a-z]+/)) } } require 'tempfile' Tempfile.open('citrus') do |tmp| tmp << "abd def" tmp.close match = grammar.parse_file(tmp.path) assert(match) assert_instance_of(Input, match.input) assert_instance_of(Pathname, match.source) match.matches.each do |m| assert_instance_of(Input, m.input) assert_instance_of(Pathname, m.source) end end end def test_labeled_production grammar = Grammar.new { rule(:abc) { label('abc', :p){ capture(:p) } } } assert_equal('abc', grammar.parse('abc').value) end def test_global_grammar assert_raise ArgumentError do grammar(:abc) end end end citrus-3.0.2/test/helper.rb0000644000175000017500000000131513124403235014631 0ustar pravipravilib = File.expand_path('../../lib', __FILE__) $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib) require 'test/unit' require 'citrus/core_ext' class Test::Unit::TestCase include Citrus TestGrammar = Grammar.new do rule :alpha do /[a-zA-Z]/ end rule :num do ext(/[0-9]/) { to_i } end rule :alphanum do any(:alpha, :num) end end Double = Grammar.new do include TestGrammar root :double rule :double do one_or_more(:num) end end Words = Grammar.new do include TestGrammar root :words rule :word do one_or_more(:alpha) end rule :words do [ :word, zero_or_more([ ' ', :word ]) ] end end end citrus-3.0.2/test/match_test.rb0000644000175000017500000000707213124403235015513 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class MatchTest < Test::Unit::TestCase def test_string_equality match = Match.new('hello') assert_equal('hello', match) end def test_match_equality match1 = Match.new('a') match2 = Match.new('a') assert(match1 == match2) assert(match2 == match1) end def test_match_inequality match1 = Match.new('a') match2 = Match.new('b') assert_equal(false, match1 == match2) assert_equal(false, match2 == match1) end def test_source match1 = Match.new('abcdef') assert_equal 'abcdef', match1.source path = Struct.new(:to_path).new(__FILE__) match2 = Match.new(Input.new(path)) assert_equal path, match2.source end def test_string match1 = Match.new('abcdef') assert_equal 'abcdef', match1.string match2 = Match.new('abcdef', [Rule.for('bcd'), -1, 3], 1) assert_equal 'bcd', match2.string end def test_matches a = Rule.for('a') b = Rule.for('b') c = Rule.for('c') s = Rule.for([ a, b, c ]) r = Repeat.new(s, 0, Infinity) events = [ r, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3, CLOSE, 9 ] match = Match.new("abcabcabc", events) assert(match.matches) assert_equal(3, match.matches.length) sub_events = [ s, a, CLOSE, 1, b, CLOSE, 1, c, CLOSE, 1, CLOSE, 3 ] match.matches.each_with_index do |m, i| assert_equal(sub_events, m.events) assert_equal(i*3, m.offset) assert_equal(3, m.length) assert_equal("abc", m.string) assert_equal("abc", m) assert(m.matches) assert_equal(3, m.matches.length) m.matches.each_with_index do |m2,i2| assert_equal(i*3+i2, m2.offset) assert_equal(1, m2.length) end end end grammar :Addition do rule :additive do all(:number, :plus, label(any(:additive, :number), 'term')) { capture(:number).value + capture(:term).value } end rule :number do all(/[0-9]+/, :space) { to_str.strip.to_i } end rule :plus do all('+', :space) end rule :space do /[ \t]*/ end end def test_matches2 match = Addition.parse('+', :root => :plus) assert(match) assert(match.matches) assert_equal(2, match.matches.length) match = Addition.parse('+ ', :root => :plus) assert(match) assert(match.matches) assert_equal(2, match.matches.length) match = Addition.parse('99', :root => :number) assert(match) assert(match.matches) assert_equal(2, match.matches.length) match = Addition.parse('99 ', :root => :number) assert(match) assert(match.matches) assert_equal(2, match.matches.length) match = Addition.parse('1+2') assert(match) assert(match.matches) assert_equal(3, match.matches.length) end def test_capture match = Addition.parse('1+2') assert(match.capture(:number)) assert_equal(1, match.capture(:number).value) end def test_captures match = Addition.parse('1+2') assert(match.captures) assert_equal(7, match.captures.size) end def test_captures_with_a_name match = Addition.parse('1+2') assert(match.captures(:number)) assert_equal(2, match.captures(:number).size) assert_equal([1, 2], match.captures(:number).map(&:value)) end end citrus-3.0.2/test/label_test.rb0000644000175000017500000000104613124403235015471 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class LabelTest < Test::Unit::TestCase def test_to_s rule = Rule.for('a') rule.label = 'a_label' assert_equal('a_label:"a"', rule.to_s) end def test_to_s_sequence rule = Sequence.new(%w< a b >) rule.label = 's_label' assert_equal('s_label:("a" "b")', rule.to_s) end def test_to_s_embedded a = Rule.for('a') a.label = 'a_label' rule = Sequence.new([ a, 'b' ]) rule.label = 's_label' assert_equal('s_label:(a_label:"a" "b")', rule.to_s) end end citrus-3.0.2/test/choice_test.rb0000644000175000017500000000176013124403235015647 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class ChoiceTest < Test::Unit::TestCase def test_terminal? rule = Choice.new assert_equal(false, rule.terminal?) end def test_exec a = Rule.for('a') b = Rule.for('b') rule = Choice.new([ a, b ]) events = rule.exec(Input.new('')) assert_equal([], events) events = rule.exec(Input.new('a')) assert(events) assert_equal([rule, a, CLOSE, 1, CLOSE, 1], events) events = rule.exec(Input.new('b')) assert(events) assert_equal([rule, b, CLOSE, 1, CLOSE, 1], events) end def test_to_s rule = Choice.new(%w) assert_equal('"a" | "b"', rule.to_s) end def test_to_embedded_s rule1 = Choice.new(%w) rule2 = Choice.new(%w) rule = Choice.new([rule1, rule2]) assert_equal('("a" | "b") | ("c" | "d")', rule.to_s) end def test_to_s_with_label rule = Choice.new(%w) rule.label = 'a_label' assert_equal('a_label:("a" | "b")', rule.to_s) end end citrus-3.0.2/test/super_test.rb0000644000175000017500000000345313124403235015554 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class SuperTest < Test::Unit::TestCase def test_terminal? rule = Super.new assert_equal(false, rule.terminal?) end def test_exec ghi = Rule.for('ghi') grammar1 = Grammar.new { rule :a, 'abc' } grammar2 = Grammar.new { include grammar1 rule :a, any(ghi, sup) } rule_2a = grammar2.rule(:a) rule_2a_als = rule_2a.rules[0] rule_2a_sup = rule_2a.rules[1] events = rule_2a.exec(Input.new('abc')) assert_equal([ rule_2a, rule_2a_sup, CLOSE, 3, CLOSE, 3 ], events) events = rule_2a.exec(Input.new('ghi')) assert_equal([ rule_2a, rule_2a_als, CLOSE, 3, CLOSE, 3 ], events) end def test_exec_miss grammar1 = Grammar.new { rule :a, 'abc' } grammar2 = Grammar.new { include grammar1 rule :a, any('def', sup) } rule_2a = grammar2.rule(:a) events = rule_2a.exec(Input.new('ghi')) assert_equal([], events) end def test_exec_aliased grammar1 = Grammar.new { rule :a, 'abc' rule :b, 'def' } grammar2 = Grammar.new { include grammar1 rule :a, any(sup, :b) rule :b, sup } rule_2a = grammar2.rule(:a) rule_2a_sup = rule_2a.rules[0] rule_2a_als = rule_2a.rules[1] events = rule_2a.exec(Input.new('abc')) assert_equal([ rule_2a, rule_2a_sup, CLOSE, 3, CLOSE, 3 ], events) events = rule_2a.exec(Input.new('def')) assert_equal([ rule_2a, rule_2a_als, CLOSE, 3, CLOSE, 3 ], events) end def test_to_s rule = Super.new assert_equal('super', rule.to_s) end def test_to_s_with_label rule = Super.new rule.label = 'a_label' assert_equal('a_label:super', rule.to_s) end end citrus-3.0.2/test/file_test.rb0000644000175000017500000005143213124403235015335 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class CitrusFileTest < Test::Unit::TestCase ## File tests def run_file_test(file, root) match = File.parse(::File.read(file), :root => root) assert(match) end %w.each do |type| Dir[::File.dirname(__FILE__) + "/_files/#{type}*.citrus"].each do |path| module_eval(<<-CODE.gsub(/^ /, ''), __FILE__, __LINE__ + 1) def test_#{::File.basename(path, '.citrus')} run_file_test("#{path}", :#{type}) end CODE end end ## Hierarchical syntax def test_expression_empty assert_raise SyntaxError do File.parse('', :root => :expression) end end def test_expression_alias match = File.parse('rule_name', :root => :expression) assert(match) assert_instance_of(Alias, match.value) end def test_expression_dot match = File.parse('.', :root => :expression) assert(match) assert_instance_of(Terminal, match.value) end def test_expression_character_range match = File.parse('[a-z]', :root => :expression) assert(match) assert_instance_of(Terminal, match.value) end def test_expression_terminal match = File.parse('/./', :root => :expression) assert(match) assert_instance_of(Terminal, match.value) end def test_expression_string_terminal_empty match = File.parse('""', :root => :expression) assert(match) assert_instance_of(StringTerminal, match.value) end def test_expression_string_terminal match = File.parse('"a"', :root => :expression) assert(match) assert_instance_of(StringTerminal, match.value) end def test_expression_string_terminal_empty_block match = File.parse('"" {}', :root => :expression) assert(match) assert_instance_of(StringTerminal, match.value) end def test_expression_repeat_string_terminal match = File.parse('"a"*', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_empty_string_terminal_block match = File.parse('""* {}', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_sequence match = File.parse('("a" "b")*', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_choice match = File.parse('("a" | "b")*', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_sequence_block match = File.parse('("a" "b")* {}', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_choice_block match = File.parse('("a" | "b")* {}', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_sequence_extension match = File.parse('("a" "b")* ', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_sequence_extension_spaced match = File.parse('( "a" "b" )* ', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_repeat_choice_extension match = File.parse('("a" | "b")* ', :root => :expression) assert(match) assert_instance_of(Repeat, match.value) end def test_expression_choice_terminal match = File.parse('/./ | /./', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_expression_choice_string_terminal match = File.parse('"a" | "b"', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_expression_choice_embedded_sequence match = File.parse('"a" | ("b" "c")', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_expression_choice_mixed match = File.parse('("a" | /./)', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_expression_choice_extended match = File.parse('("a" | "b") ', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_expression_sequence_terminal match = File.parse('/./ /./', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_expression_sequence_string_terminal match = File.parse('"a" "b"', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_expression_sequence_extension match = File.parse('( "a" "b" ) ', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_expression_sequence_mixed match = File.parse('"a" ("b" | /./)* ', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_expression_sequence_block match = File.parse('"a" ("b" | /./)* {}', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_precedence_sequence_before_choice # Sequence should bind more tightly than Choice. match = File.parse('"a" "b" | "c"', :root => :expression) assert(match) assert_instance_of(Choice, match.value) end def test_precedence_parentheses # Parentheses should change binding precedence. match = File.parse('"a" ("b" | "c")', :root => :expression) assert(match) assert_instance_of(Sequence, match.value) end def test_precedence_repeat_before_predicate # Repeat should bind more tightly than AndPredicate. match = File.parse("&'a'+", :root => :expression) assert(match) assert_instance_of(AndPredicate, match.value) end def test_sequence match = File.parse('"" ""', :root => :sequence) assert(match) assert_instance_of(Sequence, match.value) end def test_sequence_embedded_choice match = File.parse('"a" ("b" | "c")', :root => :sequence) assert(match) assert_instance_of(Sequence, match.value) end def test_labelled match = File.parse('label:""', :root => :labelled) assert(match) assert_instance_of(StringTerminal, match.value) end def test_extended_tag match = File.parse('"" ', :root => :extended) assert(match) assert_kind_of(Rule, match.value) assert_kind_of(Module, match.value.extension) end def test_extended_block match = File.parse('"" {}', :root => :extended) assert(match) assert_kind_of(Rule, match.value) assert_kind_of(Module, match.value.extension) end def test_prefix_and match = File.parse('&""', :root => :prefix) assert(match) assert_instance_of(AndPredicate, match.value) end def test_prefix_not match = File.parse('!""', :root => :prefix) assert(match) assert_instance_of(NotPredicate, match.value) end def test_prefix_but match = File.parse('~""', :root => :prefix) assert(match) assert_instance_of(ButPredicate, match.value) end def test_suffix_plus match = File.parse('""+', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_suffix_question match = File.parse('""?', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_suffix_star match = File.parse('""*', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_suffix_n_star match = File.parse('""1*', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_suffix_star_n match = File.parse('""*2', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_suffix_n_star_n match = File.parse('""1*2', :root => :suffix) assert(match) assert_instance_of(Repeat, match.value) end def test_primary_alias match = File.parse('rule_name', :root => :primary) assert(match) assert_instance_of(Alias, match.value) end def test_primary_string_terminal match = File.parse('"a"', :root => :primary) assert(match) assert_instance_of(StringTerminal, match.value) end ## Lexical syntax def test_require match = File.parse('require "some/file"', :root => :require) assert(match) assert_equal('some/file', match.value) end def test_require_no_space match = File.parse('require"some/file"', :root => :require) assert(match) assert_equal('some/file', match.value) end def test_require_single_quoted match = File.parse("require 'some/file'", :root => :require) assert(match) assert_equal('some/file', match.value) end def test_include match = File.parse('include Module', :root => :include) assert(match) assert_equal(Module, match.value) end def test_include_colon_colon match = File.parse('include ::Module', :root => :include) assert(match) assert_equal(Module, match.value) end def test_root match = File.parse('root some_rule', :root => :root) assert(match) assert_equal('some_rule', match.value) end def test_root_invalid assert_raise SyntaxError do File.parse('root :a_root', :root => :root) end end def test_rule_name match = File.parse('some_rule', :root => :rule_name) assert(match) assert('some_rule', match.value) end def test_rule_name_space match = File.parse('some_rule ', :root => :rule_name) assert(match) assert('some_rule', match.value) end def test_terminal_single_quoted_string match = File.parse("'a'", :root => :terminal) assert(match) assert_instance_of(StringTerminal, match.value) end def test_terminal_double_quoted_string match = File.parse('"a"', :root => :terminal) assert(match) assert_instance_of(StringTerminal, match.value) end def test_terminal_case_insensitive_string match = File.parse('`a`', :root => :terminal) assert(match) assert_instance_of(StringTerminal, match.value) end def test_terminal_regular_expression match = File.parse('/./', :root => :terminal) assert(match) assert_instance_of(Terminal, match.value) end def test_terminal_character_class match = File.parse('[a-z]', :root => :terminal) assert(match) assert_instance_of(Terminal, match.value) end def test_terminal_dot match = File.parse('.', :root => :terminal) assert(match) assert_instance_of(Terminal, match.value) end def test_single_quoted_string match = File.parse("'test'", :root => :quoted_string) assert(match) assert_equal('test', match.value) end def test_single_quoted_string_embedded_single_quote match = File.parse("'te\\'st'", :root => :quoted_string) assert(match) assert_equal("te'st", match.value) end def test_single_quoted_string_embedded_double_quote match = File.parse("'te\"st'", :root => :quoted_string) assert(match) assert_equal('te"st', match.value) end def test_double_quoted_string match = File.parse('"test"', :root => :quoted_string) assert(match) assert_equal('test', match.value) end def test_double_quoted_string_embedded_double_quote match = File.parse('"te\\"st"', :root => :quoted_string) assert(match) assert_equal('te"st', match.value) end def test_double_quoted_string_embedded_single_quote match = File.parse('"te\'st"', :root => :quoted_string) assert(match) assert_equal("te'st", match.value) end def test_double_quoted_string_hex match = File.parse('"\\x26"', :root => :quoted_string) assert(match) assert_equal('&', match.value) end def test_case_insensitive_string match = File.parse('`test`', :root => :case_insensitive_string) assert(match) assert_equal('test', match.value) end def test_case_insensitive_string_embedded_double_quote match = File.parse('`te\\"st`', :root => :case_insensitive_string) assert(match) assert_equal('te"st', match.value) end def test_case_insensitive_string_embedded_backtick match = File.parse('`te\`st`', :root => :case_insensitive_string) assert(match) assert_equal("te`st", match.value) end def test_case_insensitive_string_hex match = File.parse('`\\x26`', :root => :case_insensitive_string) assert(match) assert_equal('&', match.value) end def test_regular_expression match = File.parse('/./', :root => :regular_expression) assert(match) assert_equal(/./, match.value) end def test_regular_expression_escaped_forward_slash match = File.parse('/\\//', :root => :regular_expression) assert(match) assert_equal(/\//, match.value) end def test_regular_expression_escaped_backslash match = File.parse('/\\\\/', :root => :regular_expression) assert(match) assert_equal(/\\/, match.value) end def test_regular_expression_hex match = File.parse('/\\x26/', :root => :regular_expression) assert(match) assert_equal(/\x26/, match.value) end def test_regular_expression_with_flag match = File.parse('/a/i', :root => :regular_expression) assert(match) assert_equal(/a/i, match.value) end def test_character_class match = File.parse('[_]', :root => :character_class) assert(match) assert_equal(/[_]/n, match.value) end def test_character_class_a_z match = File.parse('[a-z]', :root => :character_class) assert(match) assert_equal(/[a-z]/n, match.value) end def test_character_class_nested_square_brackets match = File.parse('[\[-\]]', :root => :character_class) assert(match) assert_equal(/[\[-\]]/n, match.value) end def test_character_class_hex_range match = File.parse('[\\x26-\\x29]', :root => :character_class) assert(match) assert_equal(/[\x26-\x29]/, match.value) end def test_dot match = File.parse('.', :root => :dot) assert(match) assert_equal(DOT, match.value) end def test_label match = File.parse('label:', :root => :label) assert(match) assert_equal(:label, match.value) end def test_label_spaced match = File.parse('a_label : ', :root => :label) assert(match) assert_equal(:a_label, match.value) end def test_tag match = File.parse('', :root => :tag) assert(match) assert_equal(Module, match.value) end def test_tag_inner_space match = File.parse('< Module >', :root => :tag) assert(match) assert_equal(Module, match.value) end def test_tag_space match = File.parse(' ', :root => :tag) assert(match) assert_equal(Module, match.value) end def test_block match = File.parse('{}', :root => :block) assert(match) assert(match.value) end def test_block_space match = File.parse("{} \n", :root => :block) assert(match) assert(match.value) end def test_block_n match = File.parse('{ 2 }', :root => :block) assert(match) assert(match.value) assert_equal(2, match.value.call) end def test_block_with_hash match = File.parse("{ {:a => :b}\n}", :root => :block) assert(match) assert(match.value) assert_equal({:a => :b}, match.value.call) end def test_block_proc match = File.parse("{|b|\n Proc.new(&b)\n}", :root => :block) assert(match) assert(match.value) b = match.value.call(Proc.new { :hi }) assert(b) assert_equal(:hi, b.call) end def test_block_def match = File.parse("{def value; 'a' end}", :root => :block) assert(match) assert(match.value) assert_instance_of(Module, match.value) method_names = match.value.instance_methods.map {|m| m.to_sym } assert_equal([:value], method_names) end def test_block_def_multiline match = File.parse("{\n def value\n 'a'\n end\n} ", :root => :block) assert(match) assert(match.value) assert_instance_of(Module, match.value) method_names = match.value.instance_methods.map {|m| m.to_sym } assert_equal([:value], method_names) end def test_block_with_interpolation match = File.parse('{ "#{number}" }', :root => :block) assert(match) assert(match.value) end def test_predicate_and match = File.parse('&', :root => :predicate) assert(match) assert_instance_of(AndPredicate, match.value('')) end def test_predicate_not match = File.parse('!', :root => :predicate) assert(match) assert_instance_of(NotPredicate, match.value('')) end def test_predicate_but match = File.parse('~', :root => :predicate) assert(match) assert_instance_of(ButPredicate, match.value('')) end def test_and match = File.parse('&', :root => :and) assert(match) assert_instance_of(AndPredicate, match.value('')) end def test_and_space match = File.parse('& ', :root => :and) assert(match) assert_instance_of(AndPredicate, match.value('')) end def test_not match = File.parse('!', :root => :not) assert(match) assert_instance_of(NotPredicate, match.value('')) end def test_not_space match = File.parse('! ', :root => :not) assert(match) assert_instance_of(NotPredicate, match.value('')) end def test_but match = File.parse('~', :root => :but) assert(match) assert_instance_of(ButPredicate, match.value('')) end def test_but_space match = File.parse('~ ', :root => :but) assert(match) assert_instance_of(ButPredicate, match.value('')) end def test_repeat_question match = File.parse('?', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_repeat_plus match = File.parse('+', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_repeat_star match = File.parse('*', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_repeat_n_star match = File.parse('1*', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_repeat_star_n match = File.parse('*2', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_repeat_n_star_n match = File.parse('1*2', :root => :repeat) assert(match) assert_instance_of(Repeat, match.value('')) end def test_question match = File.parse('?', :root => :question) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(0, rule.min) assert_equal(1, rule.max) end def test_question_space match = File.parse('? ', :root => :question) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(0, rule.min) assert_equal(1, rule.max) end def test_plus match = File.parse('+', :root => :plus) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(1, rule.min) assert_equal(Infinity, rule.max) end def test_plus_space match = File.parse('+ ', :root => :plus) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(1, rule.min) assert_equal(Infinity, rule.max) end def test_star match = File.parse('*', :root => :star) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(0, rule.min) assert_equal(Infinity, rule.max) end def test_n_star match = File.parse('1*', :root => :star) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(1, rule.min) assert_equal(Infinity, rule.max) end def test_star_n match = File.parse('*2', :root => :star) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(0, rule.min) assert_equal(2, rule.max) end def test_n_star_n match = File.parse('1*2', :root => :star) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(1, rule.min) assert_equal(2, rule.max) end def test_n_star_n_space match = File.parse('1*2 ', :root => :star) assert(match) rule = match.value('') assert_instance_of(Repeat, rule) assert_equal(1, rule.min) assert_equal(2, rule.max) end def test_module_name match = File.parse('Module', :root => :module_name) assert(match) assert_equal('Module', match) end def test_module_name_space match = File.parse('Module ', :root => :module_name) assert(match) assert_equal('Module', match.first) end def test_module_name_colon_colon match = File.parse('::Proc', :root => :module_name) assert(match) assert_equal('::Proc', match) end def test_constant match = File.parse('Math', :root => :constant) assert(match) assert_equal('Math', match) end def test_constant_invalid assert_raise SyntaxError do File.parse('math', :root => :constant) end end def test_comment match = File.parse('# A comment.', :root => :comment) assert(match) assert_equal('# A comment.', match) end end citrus-3.0.2/test/extension_test.rb0000644000175000017500000000152013124403235016423 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class ExtensionTest < Test::Unit::TestCase module MatchModule def a_test :test end end module NumericModule def add_one to_str.to_i + 1 end end NumericProcBare = Proc.new { to_str.to_i + 1 } def test_match_module rule = StringTerminal.new('abc') rule.extension = MatchModule match = rule.parse('abc') assert(match) assert_equal(:test, match.a_test) end def test_numeric_module rule = StringTerminal.new('1') rule.extension = NumericModule match = rule.parse('1') assert(match) assert_equal(2, match.add_one) end def test_numeric_proc_bare rule = StringTerminal.new('1') rule.extension = NumericProcBare match = rule.parse('1') assert(match) assert_equal(2, match.value) end end citrus-3.0.2/test/but_predicate_test.rb0000644000175000017500000000204513124403235017224 0ustar pravipravirequire File.expand_path('../helper', __FILE__) class ButPredicateTest < Test::Unit::TestCase def test_terminal? rule = ButPredicate.new assert_equal(false, rule.terminal?) end def test_exec rule = ButPredicate.new('abc') events = rule.exec(Input.new('def')) assert_equal([rule, CLOSE, 3], events) events = rule.exec(Input.new('defabc')) assert_equal([rule, CLOSE, 3], events) end def test_exec_miss rule = ButPredicate.new('abc') events = rule.exec(Input.new('abc')) assert_equal([], events) end def test_consumption rule = ButPredicate.new(Sequence.new(['a', 'b', 'c'])) input = Input.new('def') events = rule.exec(input) assert_equal(3, input.pos) input = Input.new('defabc') events = rule.exec(input) assert_equal(3, input.pos) end def test_to_s rule = ButPredicate.new('a') assert_equal('~"a"', rule.to_s) end def test_to_s_with_label rule = ButPredicate.new('a') rule.label = 'a_label' assert_equal('a_label:~"a"', rule.to_s) end endcitrus-3.0.2/test/_files/0000755000175000017500000000000013124403235014266 5ustar pravipravicitrus-3.0.2/test/_files/file3.citrus0000644000175000017500000000016213124403235016522 0ustar pravipravigrammar SuperThree rule keyword super_keyword | "keyword" end rule super_keyword "super" end end citrus-3.0.2/test/_files/grammar2.citrus0000644000175000017500000000045013124403235017230 0ustar pravipravigrammar Calc rule number (float | integer) { def value first.value end } end rule float (integer '.' integer) { def value text.to_f end } end rule integer [0-9]+ { def value text.to_i end } end end citrus-3.0.2/test/_files/rule3.citrus0000644000175000017500000000010213124403235016544 0ustar pravipravirule int [0-9]+ { def value text.to_i end } end citrus-3.0.2/test/_files/rule7.citrus0000644000175000017500000000003513124403235016555 0ustar pravipravirule abc "abc" | super end citrus-3.0.2/test/_files/grammar1.citrus0000644000175000017500000000002113124403235017221 0ustar pravipravigrammar Calc end citrus-3.0.2/test/_files/file1.citrus0000644000175000017500000000017413124403235016523 0ustar pravipravigrammar SuperOne rule num '1' end end grammar SuperOneSub include SuperOne rule num '2' | super end end citrus-3.0.2/test/_files/rule2.citrus0000644000175000017500000000002613124403235016550 0ustar pravipravirule int [0-9]+ end citrus-3.0.2/test/_files/alias.citrus0000644000175000017500000000011613124403235016610 0ustar pravipravigrammar AliasOne rule alias value end rule value 'a' end end citrus-3.0.2/test/_files/rule4.citrus0000644000175000017500000000020013124403235016544 0ustar pravipravirule method (def space method_name statements ends) { "#{method_name.value} = function() { #{statements.value} }" } end citrus-3.0.2/test/_files/rule1.citrus0000644000175000017500000000002013124403235016541 0ustar pravipravirule int '' end citrus-3.0.2/test/_files/file2.citrus0000644000175000017500000000021213124403235016515 0ustar pravipravigrammar SuperTwo rule root (one | two) { super() * 1000 } end rule one "1" { 1 } end rule two "2" { 2 } end end citrus-3.0.2/test/_files/grammar3.citrus0000644000175000017500000000013013124403235017224 0ustar pravipravigrammar EndTagGrammar rule end_tag /*./ end rule number end_tag end end citrus-3.0.2/test/_files/rule6.citrus0000644000175000017500000000001713124403235016554 0ustar pravipravirule super end citrus-3.0.2/test/_files/rule5.citrus0000644000175000017500000000002213124403235016547 0ustar pravipravirule super '' end citrus-3.0.2/citrus.gemspec0000644000175000017500000000157613124403235014735 0ustar pravipravi$LOAD_PATH.unshift(File.expand_path('../lib', __FILE__)) require 'citrus/version' Gem::Specification.new do |s| s.name = 'citrus' s.version = Citrus.version s.date = Time.now.strftime('%Y-%m-%d') s.summary = 'Parsing Expressions for Ruby' s.description = 'Parsing Expressions for Ruby' s.author = 'Michael Jackson' s.email = 'mjijackson@gmail.com' s.require_paths = %w< lib > s.files = Dir['benchmark/**'] + Dir['doc/**'] + Dir['extras/**'] + Dir['lib/**/*.rb'] + Dir['test/**/*'] + %w< citrus.gemspec Rakefile README.md CHANGES > s.test_files = s.files.select {|path| path =~ /^test\/.*_test.rb/ } s.add_development_dependency('rake') s.has_rdoc = true s.rdoc_options = %w< --line-numbers --inline-source --title Citrus --main Citrus > s.extra_rdoc_files = %w< README.md CHANGES > s.homepage = 'http://mjackson.github.io/citrus' end