jakarta-oro-2.0.8/ 0000755 0001750 0001750 00000000000 10423240033 013230 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/ 0000755 0001750 0001750 00000000000 07773723336 014051 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/ 0000755 0001750 0001750 00000000000 07773723336 014772 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/ 0000755 0001750 0001750 00000000000 07773723336 015561 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/ 0000755 0001750 0001750 00000000000 07773723336 017002 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/ 0000755 0001750 0001750 00000000000 10423237774 017571 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/text/ 0000755 0001750 0001750 00000000000 10423237774 020555 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/ 0000755 0001750 0001750 00000000000 10423237774 021667 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Substitution.java 0000644 0001750 0001750 00000045614 07773723336 026200 0 ustar arnaud arnaud /*
* $Id: Perl5Substitution.java,v 1.13 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* The substitution string may contain variable interpolations referring * to the saved parenthesized groups of the search pattern. * A variable interpolation is denoted by $1, or $2, * or $3, etc. If you want such expressions to be * interpreted literally, you should set the numInterpolations * parameter to INTERPOLATE_NONE . It is easiest to explain * what an interpolated variable does by giving an example: *
Tank b123: 85 Tank b256: 32 Tank b78: 22* and use a numInterpolations value of INTERPOLATE_ALL and * numSubs value (see * {@link Util#substitute Util.substitute}) * of SUBSTITUTE_ALL, then your result will be: *
Tank a123- 85 Tank a256- 32 Tank a78- 22* But if you set numInterpolations to 2 and keep * numSubs with a value of SUBSTITUTE_ALL, your result is: *
Tank a123- 85 Tank a256- 32 Tank a256- 22* Notice how the last substitution uses the same value for $1 * as the second substitution. * *
* A final thing to keep in mind is that if you use an interpolation variable * that corresponds to a group not contained in the match, then it is * interpreted as the empty string. So given the regular expression from the * example, and a substitution expression of a$2-, the result * of the last sample input would be: *
Tank a- 85 Tank a- 32 Tank a- 22* The special substitution $& will interpolate the entire portion * of the input matched by the regular expression. $0 will * do the same, but it is recommended that it be avoided because the * latest versions of Perl use $0 to store the program name rather * than duplicate the behavior of $&. * Also, the result of substituting $ followed by a non-positive integer * is undefined. In order to include a $ in a substitution, it should * be escaped with a backslash (e.g., "\\$0"). *
* Perl5 double-quoted string case modification is also supported in * the substitution. The following escape sequences are supported: *
* @param substitution The string to use as a substitution. */ public Perl5Substitution(String substitution) { this(substitution, INTERPOLATE_ALL); } /** * Creates a Perl5Substitution using the specified substitution * and setting the number of interpolations to the specified value. *
* @param substitution The string to use as a substitution. * @param numInterpolations * If set to INTERPOLATE_NONE, interpolation variables are * interpreted literally and not as references to the saved * parenthesized groups of a pattern match. If set to * INTERPOLATE_ALL , all variable interpolations * are computed relative to the pattern match responsible for * the current substitution. If set to a positive integer, * the first numInterpolations substitutions have * their variable interpolation performed relative to the * most recent match, but the remaining substitutions have * their variable interpolations performed relative to the * numInterpolations 'th match. */ public Perl5Substitution(String substitution, int numInterpolations) { setSubstitution(substitution, numInterpolations); } /** * Sets the substitution represented by this Perl5Substitution, also * setting the number of interpolations to * {@link #INTERPOLATE_ALL}. * You should use this method in order to avoid repeatedly allocating new * Perl5Substitutions. It is recommended that you allocate a single * Perl5Substitution and reuse it by using this method when appropriate. *
* @param substitution The string to use as a substitution. */ public void setSubstitution(String substitution) { setSubstitution(substitution, INTERPOLATE_ALL); } /** * Sets the substitution represented by this Perl5Substitution, also * setting the number of interpolations to the specified value. * You should use this method in order to avoid repeatedly allocating new * Perl5Substitutions. It is recommended that you allocate a single * Perl5Substitution and reuse it by using this method when appropriate. *
* @param substitution The string to use as a substitution. * @param numInterpolations * If set to INTERPOLATE_NONE, interpolation variables are * interpreted literally and not as references to the saved * parenthesized groups of a pattern match. If set to * INTERPOLATE_ALL , all variable interpolations * are computed relative to the pattern match responsible for * the current substitution. If set to a positive integer, * the first numInterpolations substitutions have * their variable interpolation performed relative to the * most recent match, but the remaining substitutions have * their variable interpolations performed relative to the * numInterpolations 'th match. */ public void setSubstitution(String substitution, int numInterpolations) { super.setSubstitution(substitution); _numInterpolations = numInterpolations; if(numInterpolations != INTERPOLATE_NONE && (substitution.indexOf('$') != -1 || substitution.indexOf('\\') != -1)) __parseSubs(substitution); else _subOpcodes = null; _lastInterpolation = null; } /** * Appends the substitution to a buffer containing the original input * with substitutions applied for the pattern matches found so far. * See * {@link Substitution#appendSubstitution Substitution.appendSubstition()} * for more details regarding the expected behavior of this method. *
* @param appendBuffer The buffer containing the new string resulting
* from performing substitutions on the original input.
* @param match The current match causing a substitution to be made.
* @param substitutionCount The number of substitutions that have been
* performed so far by Util.substitute.
* @param originalInput The original input upon which the substitutions are
* being performed. This is a read-only parameter and is not modified.
* @param matcher The PatternMatcher used to find the current match.
* @param pattern The Pattern used to find the current match.
*/
public void appendSubstitution(StringBuffer appendBuffer, MatchResult match,
int substitutionCount,
PatternMatcherInput originalInput,
PatternMatcher matcher, Pattern pattern)
{
if(_subOpcodes == null) {
super.appendSubstitution(appendBuffer, match, substitutionCount,
originalInput, matcher, pattern);
return;
}
if(_numInterpolations < 1 || substitutionCount < _numInterpolations)
_calcSub(appendBuffer, match);
else {
if(substitutionCount == _numInterpolations)
_lastInterpolation = _finalInterpolatedSub(match);
appendBuffer.append(_lastInterpolation);
}
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Util.java 0000644 0001750 0001750 00000050723 07773723336 023466 0 ustar arnaud arnaud /*
* $Id: Util.java,v 1.15 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000-2002 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* A grep method is not included for two reasons: *
String
instance and stores results as a
* List
of substrings numbering no more than a specified
* limit. The string is split with a regular expression as the delimiter.
* The limit parameter essentially says to split the
* string only on at most the first limit - 1 number of pattern
* occurences.
* * This method is inspired by the Perl split() function and behaves * identically to it when used in conjunction with the Perl5Matcher and * Perl5Pattern classes except for the following difference: *
* In Perl, if the split expression contains parentheses, the split() * method creates additional list elements from each of the matching * subgroups in the pattern. In other words: *
* split(list, "/([,-])/", "8-12,15,18", Util.SPLIT_ALL)
produces the list containing: *
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The * following list would be produced by OROMatcher: *
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use * {@link org.apache.oro.text.perl.Perl5Util#split}. *
* @param results A Collection to which the split results are appended.
* After the method returns, it contains the substrings of the input
* that occur between the regular expression delimiter occurences.
* The input will not be split into any more substrings than the
* specified limit
. A way of thinking of this is that
* only the first limit - 1
matches of the delimiting
* regular expression will be used to split the input.
* @param matcher The regular expression matcher to execute the split.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String
to split.
* @param limit The limit on the number of resulting split elements.
* Values <= 0 produce the same behavior as using the
* SPLIT_ALL constant which causes the limit to be
* ignored and splits to be performed on all occurrences of
* the pattern. You should use the SPLIT_ALL constant
* to achieve this behavior instead of relying on the default
* behavior associated with non-positive limit values.
* @since 2.0
*/
public static void split(Collection results, PatternMatcher matcher,
Pattern pattern, String input, int limit)
{
int beginOffset;
MatchResult currentResult;
PatternMatcherInput pinput;
pinput = new PatternMatcherInput(input);
beginOffset = 0;
while(--limit != 0 && matcher.contains(pinput, pattern)) {
currentResult = matcher.getMatch();
results.add(input.substring(beginOffset,
currentResult.beginOffset(0)));
beginOffset = currentResult.endOffset(0);
}
results.add(input.substring(beginOffset, input.length()));
}
/**
* Splits up a String
instance and stores results as a
* Collection
of all its substrings using a regular expression
* as the delimiter.
* This method is inspired by the Perl split() function and behaves
* identically to it when used in conjunction with the Perl5Matcher and
* Perl5Pattern classes except for the following difference:
*
*
split(list, "/([,-])/", "8-12,15,18")
produces the list containing: *
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The * following list would be produced by OROMatcher: *
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use * {@link org.apache.oro.text.perl.Perl5Util#split}. *
* This method is identical to calling: *
** split(matcher, pattern, input, Util.SPLIT_ALL); *
* @param results A Collection
to which all the substrings of
* the input that occur between the regular expression delimiter
* occurences are appended.
* @param matcher The regular expression matcher to execute the split.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String
to split.
* @since 2.0
*/
public static void split(Collection results, PatternMatcher matcher,
Pattern pattern, String input)
{
split(results, matcher, pattern, input, SPLIT_ALL);
}
/**
* Splits up a String
instance into strings contained in a
* Vector
of size not greater than a specified limit. The
* string is split with a regular expression as the delimiter.
* The limit parameter essentially says to split the
* string only on at most the first limit - 1 number of pattern
* occurences.
*
* This method is inspired by the Perl split() function and behaves * identically to it when used in conjunction with the Perl5Matcher and * Perl5Pattern classes except for the following difference: *
* In Perl, if the split expression contains parentheses, the split() * method creates additional list elements from each of the matching * subgroups in the pattern. In other words: *
split("/([,-])/", "8-12,15,18")
produces the Vector containing: *
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The * following Vector would be produced by OROMatcher: *
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use * {@link org.apache.oro.text.perl.Perl5Util#split}. *
* @deprecated Use
* {@link #split(Collection, PatternMatcher, Pattern, String, int)} instead.
* @param matcher The regular expression matcher to execute the split.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String
to split.
* @param limit The limit on the size of the returned Vector
.
* Values <= 0 produce the same behavior as using the
* SPLIT_ALL constant which causes the limit to be
* ignored and splits to be performed on all occurrences of
* the pattern. You should use the SPLIT_ALL constant
* to achieve this behavior instead of relying on the default
* behavior associated with non-positive limit values.
* @return A Vector
containing the substrings of the input
* that occur between the regular expression delimiter occurences.
* The input will not be split into any more substrings than the
* specified limit
. A way of thinking of this is that
* only the first limit - 1
matches of the delimiting
* regular expression will be used to split the input.
* @since 1.0
*/
public static Vector split(PatternMatcher matcher, Pattern pattern,
String input, int limit)
{
Vector results = new Vector(20);
split(results, matcher, pattern, input, limit);
return results;
}
/**
* Splits up a String
instance into a Vector
* of all its substrings using a regular expression as the delimiter.
* This method is inspired by the Perl split() function and behaves
* identically to it when used in conjunction with the Perl5Matcher and
* Perl5Pattern classes except for the following difference:
*
*
split("/([,-])/", "8-12,15,18")
produces the Vector containing: *
{ "8", "-", "12", ",", "15", ",", "18" }
The OROMatcher split method does not follow this behavior. The * following Vector would be produced by OROMatcher: *
{ "8", "12", "15", "18" }
To obtain the Perl behavior, use * {@link org.apache.oro.text.perl.Perl5Util#split}. *
* This method is identical to calling: *
** split(matcher, pattern, input, Util.SPLIT_ALL); *
* @deprecated Use
* {@link #split(Collection, PatternMatcher, Pattern, String)} instead.
* @param matcher The regular expression matcher to execute the split.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String
to split.
* @return A Vector
containing all the substrings of the input
* that occur between the regular expression delimiter occurences.
* @since 1.0
*/
public static Vector split( PatternMatcher matcher, Pattern pattern,
String input)
{
return split(matcher, pattern, input, SPLIT_ALL);
}
/**
* Searches a string for a pattern and replaces the first occurrences
* of the pattern with a Substitution up to the number of
* substitutions specified by the numSubs parameter. A
* numSubs value of SUBSTITUTE_ALL will cause all occurrences
* of the pattern to be replaced.
*
* @param matcher The regular expression matcher to execute the pattern
* search.
* @param pattern The regular expression to search for and substitute
* occurrences of.
* @param sub The Substitution used to substitute pattern occurences.
* @param input The String
on which to perform substitutions.
* @param numSubs The number of substitutions to perform. Only the
* first numSubs patterns encountered are
* substituted. If you want to substitute all occurences
* set this parameter to SUBSTITUTE_ALL .
* @return A String comprising the input string with the substitutions,
* if any, made. If no substitutions are made, the returned String
* is the original input String.
* @since 1.0
*/
public static String substitute(PatternMatcher matcher, Pattern pattern,
Substitution sub, String input, int numSubs)
{
StringBuffer buffer = new StringBuffer(input.length());
PatternMatcherInput pinput = new PatternMatcherInput(input);
// Users have indicated that they expect the result to be the
// original input string, rather than a copy, if no substitutions
// are performed,
if(substitute(buffer, matcher, pattern, sub, pinput, numSubs) != 0)
return buffer.toString();
return input;
}
/**
* Searches a string for a pattern and substitutes only the first
* occurence of the pattern.
*
* This method is identical to calling: *
** substitute(matcher, pattern, sub, input, 1); *
* @param matcher The regular expression matcher to execute the pattern
* search.
* @param pattern The regular expression to search for and substitute
* occurrences of.
* @param sub The Substitution used to substitute pattern occurences.
* @param input The String
on which to perform substitutions.
* @return A String comprising the input string with the substitutions,
* if any, made. If no substitutions are made, the returned String
* is the original input String.
* @since 1.0
*/
public static String substitute(PatternMatcher matcher, Pattern pattern,
Substitution sub, String input)
{
return substitute(matcher, pattern, sub, input, 1);
}
/**
* Searches a string for a pattern and replaces the first occurrences
* of the pattern with a Substitution up to the number of
* substitutions specified by the numSubs parameter. A
* numSubs value of SUBSTITUTE_ALL will cause all occurrences
* of the pattern to be replaced. The number of substitutions made
* is returned.
*
* @param result The StringBuffer in which to store the result of the * substitutions. The buffer is only appended to. * @param matcher The regular expression matcher to execute the pattern * search. * @param pattern The regular expression to search for and substitute * occurrences of. * @param sub The Substitution used to substitute pattern occurences. * @param input The input on which to perform substitutions. * @param numSubs The number of substitutions to perform. Only the * first numSubs patterns encountered are * substituted. If you want to substitute all occurences * set this parameter to SUBSTITUTE_ALL . * @return The number of substitutions made. * @since 2.0.6 */ public static int substitute(StringBuffer result, PatternMatcher matcher, Pattern pattern, Substitution sub, String input, int numSubs) { PatternMatcherInput pinput = new PatternMatcherInput(input); return substitute(result, matcher, pattern, sub, pinput, numSubs); } /** * Searches a string for a pattern and replaces the first occurrences * of the pattern with a Substitution up to the number of * substitutions specified by the numSubs parameter. A * numSubs value of SUBSTITUTE_ALL will cause all occurrences * of the pattern to be replaced. The number of substitutions made * is returned. *
* @param result The StringBuffer in which to store the result of the
* substitutions. The buffer is only appended to.
* @param matcher The regular expression matcher to execute the pattern
* search.
* @param pattern The regular expression to search for and substitute
* occurrences of.
* @param sub The Substitution used to substitute pattern occurences.
* @param input The input on which to perform substitutions.
* @param numSubs The number of substitutions to perform. Only the
* first numSubs patterns encountered are
* substituted. If you want to substitute all occurences
* set this parameter to SUBSTITUTE_ALL .
* @return The number of substitutions made.
* @since 2.0.3
*/
public static int substitute(StringBuffer result,
PatternMatcher matcher, Pattern pattern,
Substitution sub, PatternMatcherInput input,
int numSubs)
{
int beginOffset, subCount;
char[] inputBuffer;
subCount = 0;
beginOffset = input.getBeginOffset();
inputBuffer = input.getBuffer();
// Must be != 0 because SUBSTITUTE_ALL is represented by -1.
// Do NOT change to numSubs > 0.
while(numSubs != 0 && matcher.contains(input, pattern)) {
--numSubs;
++subCount;
result.append(inputBuffer, beginOffset,
input.getMatchBeginOffset() - beginOffset);
sub.appendSubstitution(result, matcher.getMatch(), subCount,
input, matcher, pattern);
beginOffset = input.getMatchEndOffset();
}
result.append(inputBuffer, beginOffset, input.length() - beginOffset);
return subCount;
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Compiler.java 0000644 0001750 0001750 00000160030 07773723336 025224 0 ustar arnaud arnaud /*
* $Id: Perl5Compiler.java,v 1.21 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* Perl5Compiler and Perl5Matcher are designed with the intent that * you use a separate instance of each per thread to avoid the overhead * of both synchronization and concurrent access (e.g., a match that takes * a long time in one thread will block the progress of another thread with * a shorter match). If you want to use a single instance of each * in a concurrent program, you must appropriately protect access to * the instances with critical sections. If you want to share Perl5Pattern * instances between concurrently executing instances of Perl5Matcher, you * must compile the patterns with {@link Perl5Compiler#READ_ONLY_MASK}. * * @version @version@ * @since 1.0 * @see PatternCompiler * @see MalformedPatternException * @see Perl5Pattern * @see Perl5Matcher */ public final class Perl5Compiler implements PatternCompiler { private static final int __WORSTCASE = 0, __NONNULL = 0x1, __SIMPLE = 0x2, __SPSTART = 0x4, __TRYAGAIN = 0x8; private static final char __CASE_INSENSITIVE = 0x0001, __GLOBAL = 0x0002, __KEEP = 0x0004, __MULTILINE = 0x0008, __SINGLELINE = 0x0010, __EXTENDED = 0x0020, __READ_ONLY = 0x8000; private static final String __HEX_DIGIT = "0123456789abcdef0123456789ABCDEFx"; private CharStringPointer __input; private boolean __sawBackreference; private char[] __modifierFlags = { 0 }; // IMPORTANT: __numParentheses starts out equal to 1 during compilation. // It is always one greater than the number of parentheses encountered // so far in the regex. That is because it refers to the number of groups // to save, and the entire match is always saved (group 0) private int __numParentheses, __programSize, __cost; // When doing the second pass and actually generating code, __programSize // keeps track of the current offset. private char[] __program; /** Lookup table for POSIX character class names */ private static final HashMap __hashPOSIX; static { __hashPOSIX = new HashMap(); __hashPOSIX.put("alnum", new Character(OpCode._ALNUMC)); __hashPOSIX.put("word", new Character(OpCode._ALNUM)); __hashPOSIX.put("alpha", new Character(OpCode._ALPHA)); __hashPOSIX.put("blank", new Character(OpCode._BLANK)); __hashPOSIX.put("cntrl", new Character(OpCode._CNTRL)); __hashPOSIX.put("digit", new Character(OpCode._DIGIT)); __hashPOSIX.put("graph", new Character(OpCode._GRAPH)); __hashPOSIX.put("lower", new Character(OpCode._LOWER)); __hashPOSIX.put("print", new Character(OpCode._PRINT)); __hashPOSIX.put("punct", new Character(OpCode._PUNCT)); __hashPOSIX.put("space", new Character(OpCode._SPACE)); __hashPOSIX.put("upper", new Character(OpCode._UPPER)); __hashPOSIX.put("xdigit", new Character(OpCode._XDIGIT)); __hashPOSIX.put("ascii", new Character(OpCode._ASCII)); } /** * The default mask for the {@link #compile compile} methods. * It is equal to 0. * The default behavior is for a regular expression to be case sensitive * and to not specify if it is multiline or singleline. When MULITLINE_MASK * and SINGLINE_MASK are not defined, the ^, $, and . * metacharacters are * interpreted according to the value of isMultiline() in Perl5Matcher. * The default behavior of Perl5Matcher is to treat the Perl5Pattern * as though MULTILINE_MASK were enabled. If isMultiline() returns false, * then the pattern is treated as though SINGLINE_MASK were set. However, * compiling a pattern with the MULTILINE_MASK or SINGLELINE_MASK masks * will ALWAYS override whatever behavior is specified by the setMultiline() * in Perl5Matcher. */ public static final int DEFAULT_MASK = 0; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate a compiled regular expression should be case insensitive. */ public static final int CASE_INSENSITIVE_MASK = __CASE_INSENSITIVE; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate a compiled regular expression should treat input as having * multiple lines. This option affects the interpretation of * the ^ and $ metacharacters. When this mask is used, * the ^ metacharacter matches at the beginning of every line, * and the $ metacharacter matches at the end of every line. * Additionally the . metacharacter will not match newlines when * an expression is compiled with MULTILINE_MASK , which is its * default behavior. */ public static final int MULTILINE_MASK = __MULTILINE; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate a compiled regular expression should treat input as being * a single line. This option affects the interpretation of * the ^ and $ metacharacters. When this mask is used, * the ^ metacharacter matches at the beginning of the input, * and the $ metacharacter matches at the end of the input. * The ^ and $ metacharacters will not match at the beginning * and end of lines occurring between the begnning and end of the input. * Additionally, the . metacharacter will match newlines when * an expression is compiled with SINGLELINE_MASK , unlike its * default behavior. */ public static final int SINGLELINE_MASK = __SINGLELINE; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate a compiled regular expression should be treated as a Perl5 * extended pattern (i.e., a pattern using the /x modifier). This * option tells the compiler to ignore whitespace that is not backslashed or * within a character class. It also tells the compiler to treat the * # character as a metacharacter introducing a comment as in * Perl. In other words, the # character will comment out any * text in the regular expression between it and the next newline. * The intent of this option is to allow you to divide your patterns * into more readable parts. It is provided to maintain compatibility * with Perl5 regular expressions, although it will not often * make sense to use it in Java. */ public static final int EXTENDED_MASK = __EXTENDED; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate that the resulting Perl5Pattern should be treated as a * read only data structure by Perl5Matcher, making it safe to share * a single Perl5Pattern instance among multiple threads without needing * synchronization. Without this option, Perl5Matcher reserves the right * to store heuristic or other information in Perl5Pattern that might * accelerate future matches. When you use this option, Perl5Matcher will * not store or modify any information in a Perl5Pattern. Use this option * when you want to share a Perl5Pattern instance among multiple threads * using different Perl5Matcher instances. */ public static final int READ_ONLY_MASK = __READ_ONLY; /** * Given a character string, returns a Perl5 expression that interprets * each character of the original string literally. In other words, all * special metacharacters are quoted/escaped. This method is useful for * converting user input meant for literal interpretation into a safe * regular expression representing the literal input. *
* In effect, this method is the analog of the Perl5 quotemeta() builtin * method. *
* @param expression The expression to convert. * @return A String containing a Perl5 regular expression corresponding to * a literal interpretation of the pattern. */ public static final String quotemeta(char[] expression) { int ch; StringBuffer buffer; buffer = new StringBuffer(2*expression.length); for(ch = 0; ch < expression.length; ch++) { if(!OpCode._isWordCharacter(expression[ch])) buffer.append('\\'); buffer.append(expression[ch]); } return buffer.toString(); } /** * Given a character string, returns a Perl5 expression that interprets * each character of the original string literally. In other words, all * special metacharacters are quoted/escaped. This method is useful for * converting user input meant for literal interpretation into a safe * regular expression representing the literal input. *
* In effect, this method is the analog of the Perl5 quotemeta() builtin * method. *
* @param pattern The pattern to convert. * @return A String containing a Perl5 regular expression corresponding to * a literal interpretation of the pattern. */ public static final String quotemeta(String expression) { return quotemeta(expression.toCharArray()); } private static boolean __isSimpleRepetitionOp(char ch) { return (ch == '*' || ch == '+' || ch == '?'); } private static boolean __isComplexRepetitionOp(char[] ch, int offset) { if(offset < ch.length && offset >= 0) return (ch[offset] == '*' || ch[offset] == '+' || ch[offset] == '?' || (ch[offset] == '{' && __parseRepetition(ch, offset))); return false; } // determines if {\d+,\d*} is the next part of the string private static boolean __parseRepetition(char[] str, int offset) { if(str[offset] != '{') return false; ++offset; if(offset >= str.length || !Character.isDigit(str[offset])) return false; while(offset < str.length && Character.isDigit(str[offset])) ++offset; if(offset < str.length && str[offset] == ',') ++offset; while(offset < str.length && Character.isDigit(str[offset])) ++offset; if(offset >= str.length || str[offset] != '}') return false; return true; } private static int __parseHex(char[] str, int offset, int maxLength, int[] scanned) { int val = 0, index; scanned[0] = 0; while(offset < str.length && maxLength-- > 0 && (index = __HEX_DIGIT.indexOf(str[offset])) != -1) { val <<= 4; val |= (index & 15); ++offset; ++scanned[0]; } return val; } private static int __parseOctal(char[] str, int offset, int maxLength, int[] scanned) { int val = 0; scanned[0] = 0; while(offset < str.length && maxLength > 0 && str[offset] >= '0' && str[offset] <= '7') { val <<= 3; val |= (str[offset] - '0'); --maxLength; ++offset; ++scanned[0]; } return val; } private static void __setModifierFlag(char[] flags, char ch) { switch(ch) { case 'i' : flags[0] |= __CASE_INSENSITIVE; return; case 'g' : flags[0] |= __GLOBAL; return; case 'o' : flags[0] |= __KEEP; return; case 'm' : flags[0] |= __MULTILINE; return; case 's' : flags[0] |= __SINGLELINE; return; case 'x' : flags[0] |= __EXTENDED; return; } } // Emit a specific character code. private void __emitCode(char code) { if(__program != null) __program[__programSize] = code; ++__programSize; } // Emit an operator with no arguments. // Return an offset into the __program array as a pointer to node. private int __emitNode(char operator) { int offset; offset = __programSize; if(__program == null) __programSize+=2; else { __program[__programSize++] = operator; __program[__programSize++] = OpCode._NULL_POINTER; } return offset; } // Emit an operator with arguments. // Return an offset into the __programarray as a pointer to node. private int __emitArgNode(char operator, char arg) { int offset; offset = __programSize; if(__program== null) __programSize+=3; else { __program[__programSize++] = operator; __program[__programSize++] = OpCode._NULL_POINTER; __program[__programSize++] = arg; } return offset; } // Insert an operator at a given offset. private void __programInsertOperator(char operator, int operand) { int src, dest, offset; offset = (OpCode._opType[operator] == OpCode._CURLY ? 2 : 0); if(__program== null) { __programSize+=(2 + offset); return; } src = __programSize; __programSize+=(2 + offset); dest = __programSize; while(src > operand) { --src; --dest; __program[dest] = __program[src]; } __program[operand++] = operator; __program[operand++] = OpCode._NULL_POINTER; while(offset-- > 0) __program[operand++] = OpCode._NULL_POINTER; } private void __programAddTail(int current, int value) { int scan, temp, offset; if(__program == null || current == OpCode._NULL_OFFSET) return; scan = current; while(true) { temp = OpCode._getNext(__program, scan); if(temp == OpCode._NULL_OFFSET) break; scan = temp; } if(__program[scan] == OpCode._BACK) offset = scan - value; else offset = value - scan; __program[scan + 1] = (char)offset; } private void __programAddOperatorTail(int current, int value) { if(__program == null || current == OpCode._NULL_OFFSET || OpCode._opType[__program[current]] != OpCode._BRANCH) return; __programAddTail(OpCode._getNextOperator(current), value); } private char __getNextChar() { char ret, value; ret = __input._postIncrement(); while(true) { value = __input._getValue(); if(value == '(' && __input._getValueRelative(1) == '?' && __input._getValueRelative(2) == '#') { // Skip comments while(value != CharStringPointer._END_OF_STRING && value != ')') value = __input._increment(); __input._increment(); continue; } if((__modifierFlags[0] & __EXTENDED) != 0) { if(Character.isWhitespace(value)) { __input._increment(); continue; } else if(value == '#') { while(value != CharStringPointer._END_OF_STRING && value != '\n') value = __input._increment(); __input._increment(); continue; } } return ret; } } private int __parseAlternation(int[] retFlags) throws MalformedPatternException { int chain, offset, latest; int flags = 0; char value; retFlags[0] = __WORSTCASE; offset = __emitNode(OpCode._BRANCH); chain = OpCode._NULL_OFFSET; if(__input._getOffset() == 0) { __input._setOffset(-1); __getNextChar(); } else { __input._decrement(); __getNextChar(); } value = __input._getValue(); while(value != CharStringPointer._END_OF_STRING && value != '|' && value != ')') { flags &= ~__TRYAGAIN; latest = __parseBranch(retFlags); if(latest == OpCode._NULL_OFFSET) { if((flags & __TRYAGAIN) != 0){ value = __input._getValue(); continue; } return OpCode._NULL_OFFSET; } retFlags[0] |= (flags & __NONNULL); if(chain == OpCode._NULL_OFFSET) retFlags[0] |= (flags & __SPSTART); else { ++__cost; __programAddTail(chain, latest); } chain = latest; value = __input._getValue(); } // If loop was never entered. if(chain == OpCode._NULL_OFFSET) __emitNode(OpCode._NOTHING); return offset; } private int __parseAtom(int[] retFlags) throws MalformedPatternException { boolean doDefault; char value; int offset, flags[] = { 0 }; retFlags[0] = __WORSTCASE; doDefault = false; offset = OpCode._NULL_OFFSET; tryAgain: while(true) { value = __input._getValue(); switch(value) { case '^' : __getNextChar(); // The order here is important in order to support /ms. // /m takes precedence over /s for ^ and $, but not for . if((__modifierFlags[0] & __MULTILINE) != 0) offset = __emitNode(OpCode._MBOL); else if((__modifierFlags[0] & __SINGLELINE) != 0) offset = __emitNode(OpCode._SBOL); else offset = __emitNode(OpCode._BOL); break tryAgain; case '$': __getNextChar(); // The order here is important in order to support /ms. // /m takes precedence over /s for ^ and $, but not for . if((__modifierFlags[0] & __MULTILINE) != 0) offset = __emitNode(OpCode._MEOL); else if((__modifierFlags[0] & __SINGLELINE) != 0) offset = __emitNode(OpCode._SEOL); else offset = __emitNode(OpCode._EOL); break tryAgain; case '.': __getNextChar(); // The order here is important in order to support /ms. // /m takes precedence over /s for ^ and $, but not for . if((__modifierFlags[0] & __SINGLELINE) != 0) offset = __emitNode(OpCode._SANY); else offset = __emitNode(OpCode._ANY); ++__cost; retFlags[0] |= (__NONNULL | __SIMPLE); break tryAgain; case '[': __input._increment(); offset = __parseUnicodeClass(); retFlags[0] |= (__NONNULL | __SIMPLE); break tryAgain; case '(': __getNextChar(); offset = __parseExpression(true, flags); if(offset == OpCode._NULL_OFFSET) { if((flags[0] & __TRYAGAIN) != 0) continue tryAgain; return OpCode._NULL_OFFSET; } retFlags[0] |= (flags[0] & (__NONNULL | __SPSTART)); break tryAgain; case '|': case ')': if((flags[0] & __TRYAGAIN) != 0) { retFlags[0] |= __TRYAGAIN; return OpCode._NULL_OFFSET; } throw new MalformedPatternException("Error in expression at " + __input._toString(__input._getOffset())); //break tryAgain; case '?': case '+': case '*': throw new MalformedPatternException( "?+* follows nothing in expression"); //break tryAgain; case '\\': value = __input._increment(); switch(value) { case 'A' : offset = __emitNode(OpCode._SBOL); retFlags[0] |= __SIMPLE; __getNextChar(); break; case 'G': offset = __emitNode(OpCode._GBOL); retFlags[0] |= __SIMPLE; __getNextChar(); break; case 'Z': offset = __emitNode(OpCode._SEOL); retFlags[0] |= __SIMPLE; __getNextChar(); break; case 'w': offset = __emitNode(OpCode._ALNUM); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'W': offset = __emitNode(OpCode._NALNUM); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'b': offset = __emitNode(OpCode._BOUND); retFlags[0] |= __SIMPLE; __getNextChar(); break; case 'B': offset = __emitNode(OpCode._NBOUND); retFlags[0] |= __SIMPLE; __getNextChar(); break; case 's': offset = __emitNode(OpCode._SPACE); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'S': offset = __emitNode(OpCode._NSPACE); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'd': offset = __emitNode(OpCode._DIGIT); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'D': offset = __emitNode(OpCode._NDIGIT); retFlags[0] |= (__NONNULL | __SIMPLE); __getNextChar(); break; case 'n': case 'r': case 't': case 'f': case 'e': case 'a': case 'x': case 'c': case '0': doDefault = true; break tryAgain; case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': int num; StringBuffer buffer = new StringBuffer(10); num = 0; value = __input._getValueRelative(num); while(Character.isDigit(value)) { buffer.append(value); ++num; value = __input._getValueRelative(num); } try { num = Integer.parseInt(buffer.toString()); } catch(NumberFormatException e) { throw new MalformedPatternException( "Unexpected number format exception. Please report this bug." + "NumberFormatException message: " + e.getMessage()); } if(num > 9 && num >= __numParentheses) { doDefault = true; break tryAgain; } else { // A backreference may only occur AFTER its group if(num >= __numParentheses) throw new MalformedPatternException("Invalid backreference: \\" + num); __sawBackreference = true; offset = __emitArgNode(OpCode._REF, (char)num); retFlags[0] |= __NONNULL; value = __input._getValue(); while(Character.isDigit(value)) value = __input._increment(); __input._decrement(); __getNextChar(); } break; case '\0': case CharStringPointer._END_OF_STRING: if(__input._isAtEnd()) throw new MalformedPatternException("Trailing \\ in expression."); // fall through to default default: doDefault = true; break tryAgain; } break tryAgain; case '#': // skip over comments if((__modifierFlags[0] & __EXTENDED) != 0) { while(!__input._isAtEnd() && __input._getValue() != '\n') __input._increment(); if(!__input._isAtEnd()) continue tryAgain; } // fall through to default default: __input._increment(); doDefault = true; break tryAgain; }// end master switch } // end tryAgain if(doDefault) { char ender; int length, pOffset, maxOffset, lastOffset, numLength[]; offset = __emitNode(OpCode._EXACTLY); // Not sure that it's ok to use 0 to mark end. //__emitCode((char)0); __emitCode((char)CharStringPointer._END_OF_STRING); forLoop: for(length = 0, pOffset = __input._getOffset() - 1, maxOffset = __input._getLength(); length < 127 && pOffset < maxOffset; ++length) { lastOffset = pOffset; value = __input._getValue(pOffset); switch(value) { case '^': case '$': case '.': case '[': case '(': case ')': case '|': break forLoop; case '\\': value = __input._getValue(++pOffset); switch(value) { case 'A': case 'G': case 'Z': case 'w': case 'W': case 'b': case 'B': case 's': case 'S': case 'd': case 'D': --pOffset; break forLoop; case 'n': ender = '\n'; ++pOffset; break; case 'r': ender = '\r'; ++pOffset; break; case 't': ender = '\t'; ++pOffset; break; case 'f': ender = '\f'; ++pOffset; break; case 'e': ender = '\033'; ++pOffset; break; case 'a': ender = '\007'; ++pOffset; break; case 'x': numLength = new int[1]; ender = (char)__parseHex(__input._array, ++pOffset, 2, numLength); pOffset+=numLength[0]; break; case 'c': ++pOffset; ender = __input._getValue(pOffset++); if(Character.isLowerCase(ender)) ender = Character.toUpperCase(ender); ender ^= 64; break; case '0': case '1': case '2': case'3': case '4': case '5': case '6': case '7': case '8': case '9': boolean doOctal = false; value = __input._getValue(pOffset); if(value == '0') doOctal = true; value = __input._getValue(pOffset + 1); if(Character.isDigit(value)) { int num; StringBuffer buffer = new StringBuffer(10); num = pOffset; value = __input._getValue(num); while(Character.isDigit(value)){ buffer.append(value); ++num; value = __input._getValue(num); } try { num = Integer.parseInt(buffer.toString()); } catch(NumberFormatException e) { throw new MalformedPatternException( "Unexpected number format exception. Please report this bug." + "NumberFormatException message: " + e.getMessage()); } if(!doOctal) doOctal = (num >= __numParentheses); } if(doOctal) { numLength = new int[1]; ender = (char)__parseOctal(__input._array, pOffset, 3, numLength); pOffset+=numLength[0]; } else { --pOffset; break forLoop; } break; case CharStringPointer._END_OF_STRING: case '\0': if(pOffset >= maxOffset) throw new MalformedPatternException("Trailing \\ in expression."); // fall through to default default: ender = __input._getValue(pOffset++); break; } // end backslash switch break; case '#': if((__modifierFlags[0] & __EXTENDED) != 0) { while(pOffset < maxOffset && __input._getValue(pOffset) != '\n') ++pOffset; } // fall through to whitespace handling case ' ': case '\t': case '\n': case '\r': case '\f': case '\013': if((__modifierFlags[0] & __EXTENDED) != 0) { ++pOffset; --length; continue; } // fall through to default default: ender = __input._getValue(pOffset++); break; } // end master switch if((__modifierFlags[0] & __CASE_INSENSITIVE) != 0 && Character.isUpperCase(ender)) ender = Character.toLowerCase(ender); if(pOffset < maxOffset && __isComplexRepetitionOp(__input._array, pOffset)) { if(length > 0) pOffset = lastOffset; else { ++length; __emitCode(ender); } break; } __emitCode(ender); } // end for loop __input._setOffset(pOffset - 1); __getNextChar(); if(length < 0) throw new MalformedPatternException( "Unexpected compilation failure. Please report this bug!"); if(length > 0) retFlags[0] |= __NONNULL; if(length == 1) retFlags[0] |= __SIMPLE; if(__program!= null) __program[OpCode._getOperand(offset)] = (char)length; //__emitCode('\0'); // debug __emitCode(CharStringPointer._END_OF_STRING); } return offset; } // These are the original 8-bit character class handling methods. // We don't want to delete them just yet only to have to dig it out // of revision control later. /* // Set the bits in a character class. Only recognizes ascii. private void __setCharacterClassBits(char[] bits, int offset, char deflt, char ch) { if(__program== null || ch >= 256) return; ch &= 0xffff; if(deflt == 0) { bits[offset + (ch >> 4)] |= (1 << (ch & 0xf)); } else { bits[offset + (ch >> 4)] &= ~(1 << (ch & 0xf)); } } private int __parseCharacterClass() throws MalformedPatternException { boolean range = false, skipTest; char clss, deflt, lastclss = Character.MAX_VALUE; int offset, bits, numLength[] = { 0 }; offset = __emitNode(OpCode._ANYOF); if(__input._getValue() == '^') { ++__cost; __input._increment(); deflt = 0; } else { deflt = 0xffff; } bits = __programSize; for(clss = 0; clss < 16; clss++) __emitCode(deflt); clss = __input._getValue(); if(clss == ']' || clss == '-') skipTest = true; else skipTest = false; while((!__input._isAtEnd() && (clss = __input._getValue()) != ']') || skipTest) { // It sucks, but we have to make this assignment every time skipTest = false; __input._increment(); if(clss == '\\') { clss = __input._postIncrement(); switch(clss){ case 'w': for(clss = 0; clss < 256; clss++) if(OpCode._isWordCharacter(clss)) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 'W': for(clss = 0; clss < 256; clss++) if(!OpCode._isWordCharacter(clss)) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 's': for(clss = 0; clss < 256; clss++) if(Character.isWhitespace(clss)) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 'S': for(clss = 0; clss < 256; clss++) if(!Character.isWhitespace(clss)) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 'd': for(clss = '0'; clss <= '9'; clss++) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 'D': for(clss = 0; clss < '0'; clss++) __setCharacterClassBits(__program, bits, deflt, clss); for(clss = (char)('9' + 1); clss < 256; clss++) __setCharacterClassBits(__program, bits, deflt, clss); lastclss = Character.MAX_VALUE; continue; case 'n': clss = '\n'; break; case 'r': clss = '\r'; break; case 't': clss = '\t'; break; case 'f': clss = '\f'; break; case 'b': clss = '\b'; break; case 'e': clss = '\033'; break; case 'a': clss = '\007'; break; case 'x': clss = (char)__parseHex(__input._array, __input._getOffset(), 2, numLength); __input._increment(numLength[0]); break; case 'c': clss = __input._postIncrement(); if(Character.isLowerCase(clss)) clss = Character.toUpperCase(clss); clss ^= 64; break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': clss = (char)__parseOctal(__input._array, __input._getOffset() - 1, 3, numLength); __input._increment(numLength[0] - 1); break; } } if(range) { if(lastclss > clss) throw new MalformedPatternException( "Invalid [] range in expression."); range = false; } else { lastclss = clss; if(__input._getValue() == '-' && __input._getOffset() + 1 < __input._getLength() && __input._getValueRelative(1) != ']') { __input._increment(); range = true; continue; } } while(lastclss <= clss) { __setCharacterClassBits(__program, bits, deflt, lastclss); if((__modifierFlags[0] & __CASE_INSENSITIVE) != 0 && Character.isUpperCase(lastclss)) __setCharacterClassBits(__program, bits, deflt, Character.toLowerCase(lastclss)); ++lastclss; } lastclss = clss; } if(__input._getValue() != ']') throw new MalformedPatternException("Unmatched [] in expression."); __getNextChar(); return offset; } */ private int __parseUnicodeClass() throws MalformedPatternException { boolean range = false, skipTest; char clss, lastclss = Character.MAX_VALUE; int offset, numLength[] = { 0 }; boolean negFlag[] = { false }; boolean opcodeFlag; /* clss isn't character when this flag true. */ if(__input._getValue() == '^') { offset = __emitNode(OpCode._NANYOFUN); __input._increment(); } else { offset = __emitNode(OpCode._ANYOFUN); } clss = __input._getValue(); if(clss == ']' || clss == '-') skipTest = true; else skipTest = false; while((!__input._isAtEnd() && (clss = __input._getValue()) != ']') || skipTest) { // It sucks, but we have to make this assignment every time skipTest = false; opcodeFlag = false; __input._increment(); if(clss == '\\' || clss == '[') { if(clss == '\\') { /* character is escaped */ clss = __input._postIncrement(); } else { /* try POSIX expression */ char posixOpCode = __parsePOSIX(negFlag); if(posixOpCode != 0){ opcodeFlag = true; clss = posixOpCode; } } if (opcodeFlag != true) { switch(clss){ case 'w': opcodeFlag = true; clss = OpCode._ALNUM; lastclss = Character.MAX_VALUE; break; case 'W': opcodeFlag = true; clss = OpCode._NALNUM; lastclss = Character.MAX_VALUE; break; case 's': opcodeFlag = true; clss = OpCode._SPACE; lastclss = Character.MAX_VALUE; break; case 'S': opcodeFlag = true; clss = OpCode._NSPACE; lastclss = Character.MAX_VALUE; break; case 'd': opcodeFlag = true; clss = OpCode._DIGIT; lastclss = Character.MAX_VALUE; break; case 'D': opcodeFlag = true; clss = OpCode._NDIGIT; lastclss = Character.MAX_VALUE; break; case 'n': clss = '\n'; break; case 'r': clss = '\r'; break; case 't': clss = '\t'; break; case 'f': clss = '\f'; break; case 'b': clss = '\b'; break; case 'e': clss = '\033'; break; case 'a': clss = '\007'; break; case 'x': clss = (char)__parseHex(__input._array, __input._getOffset(), 2, numLength); __input._increment(numLength[0]); break; case 'c': clss = __input._postIncrement(); if(Character.isLowerCase(clss)) clss = Character.toUpperCase(clss); clss ^= 64; break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': clss = (char)__parseOctal(__input._array, __input._getOffset() - 1, 3, numLength); __input._increment(numLength[0] - 1); break; default: break; } } } if(range) { if(lastclss > clss) throw new MalformedPatternException( "Invalid [] range in expression."); range = false; } else { lastclss = clss; if(opcodeFlag == false && __input._getValue() == '-' && __input._getOffset() + 1 < __input._getLength() && __input._getValueRelative(1) != ']') { __input._increment(); range = true; continue; } } if(lastclss == clss) { if(opcodeFlag == true) { if(negFlag[0] == false) __emitCode(OpCode._OPCODE); else __emitCode(OpCode._NOPCODE); } else __emitCode(OpCode._ONECHAR); __emitCode(clss); if((__modifierFlags[0] & __CASE_INSENSITIVE) != 0 && Character.isUpperCase(clss) && Character.isUpperCase(lastclss)){ __programSize--; __emitCode(Character.toLowerCase(clss)); } } if(lastclss < clss) { __emitCode(OpCode._RANGE); __emitCode(lastclss); __emitCode(clss); if((__modifierFlags[0] & __CASE_INSENSITIVE) != 0 && Character.isUpperCase(clss) && Character.isUpperCase(lastclss)){ __programSize-=2; __emitCode(Character.toLowerCase(lastclss)); __emitCode(Character.toLowerCase(clss)); } lastclss = Character.MAX_VALUE; range = false; } lastclss = clss; } if(__input._getValue() != ']') throw new MalformedPatternException("Unmatched [] in expression."); __getNextChar(); __emitCode(OpCode._END); return offset; } /** * Parse POSIX epxression like [:foo:]. * * @return OpCode. return 0 when fail parsing POSIX expression. */ private char __parsePOSIX(boolean negFlag[]) throws MalformedPatternException { int offset = __input._getOffset(); int len = __input._getLength(); int pos = offset; char value = __input._getValue(pos++); StringBuffer buf; Object opcode; if( value != ':' ) return 0; if( __input._getValue(pos) == '^' ) { negFlag[0] = true; pos++; } else { negFlag[0] = false; } buf = new StringBuffer(); try { while ( (value = __input._getValue(pos++)) != ':' && pos < len) { buf.append(value); } } catch (Exception e){ return 0; } if( __input._getValue(pos++) != ']'){ return 0; } opcode = __hashPOSIX.get(buf.toString()); if( opcode == null ) return 0; __input._setOffset(pos); return ((Character)opcode).charValue(); } private int __parseBranch(int[] retFlags) throws MalformedPatternException { boolean nestCheck = false, handleRepetition = false; int offset, next, min, max, flags[] = { 0 }; char operator, value; min = 0; max = Character.MAX_VALUE; offset = __parseAtom(flags); if(offset == OpCode._NULL_OFFSET) { if((flags[0] & __TRYAGAIN) != 0) retFlags[0] |= __TRYAGAIN; return OpCode._NULL_OFFSET; } operator = __input._getValue(); if(operator == '(' && __input._getValueRelative(1) == '?' && __input._getValueRelative(2) == '#') { while(operator != CharStringPointer._END_OF_STRING && operator != ')') operator = __input._increment(); if(operator != CharStringPointer._END_OF_STRING) { __getNextChar(); operator = __input._getValue(); } } if(operator == '{' && __parseRepetition(__input._array, __input._getOffset())) { int maxOffset, pos; next = __input._getOffset() + 1; pos = maxOffset = __input._getLength(); value = __input._getValue(next); while(Character.isDigit(value) || value == ',') { if(value == ',') { if(pos != maxOffset) break; else pos = next; } ++next; value = __input._getValue(next); } if(value == '}') { int num; StringBuffer buffer = new StringBuffer(10); if(pos == maxOffset) pos = next; __input._increment(); num = __input._getOffset(); value = __input._getValue(num); while(Character.isDigit(value)) { buffer.append(value); ++num; value = __input._getValue(num); } try { min = Integer.parseInt(buffer.toString()); } catch(NumberFormatException e) { throw new MalformedPatternException( "Unexpected number format exception. Please report this bug." + "NumberFormatException message: " + e.getMessage()); } value = __input._getValue(pos); if(value == ',') ++pos; else pos = __input._getOffset(); num = pos; buffer = new StringBuffer(10); value = __input._getValue(num); while(Character.isDigit(value)){ buffer.append(value); ++num; value = __input._getValue(num); } try { if(num != pos) max = Integer.parseInt(buffer.toString()); } catch(NumberFormatException e) { throw new MalformedPatternException( "Unexpected number format exception. Please report this bug." + "NumberFormatException message: " + e.getMessage()); } if(max == 0 && __input._getValue(pos) != '0') max = Character.MAX_VALUE; __input._setOffset(next); __getNextChar(); nestCheck = true; handleRepetition = true; } } if(!nestCheck) { handleRepetition = false; if(!__isSimpleRepetitionOp(operator)) { retFlags[0] = flags[0]; return offset; } __getNextChar(); retFlags[0] = ((operator != '+') ? (__WORSTCASE | __SPSTART) : (__WORSTCASE | __NONNULL)); if(operator == '*' && ((flags[0] & __SIMPLE) != 0)) { __programInsertOperator(OpCode._STAR, offset); __cost+=4; } else if(operator == '*') { min = 0; handleRepetition = true; } else if(operator == '+' && (flags[0] & __SIMPLE) != 0) { __programInsertOperator(OpCode._PLUS, offset); __cost+=3; } else if(operator == '+') { min = 1; handleRepetition = true; } else if(operator == '?') { min = 0; max = 1; handleRepetition = true; } } if(handleRepetition) { // handle repetition if((flags[0] & __SIMPLE) != 0){ __cost+= ((2 + __cost) / 2); __programInsertOperator(OpCode._CURLY, offset); } else { __cost += (4 + __cost); __programAddTail(offset, __emitNode(OpCode._WHILEM)); __programInsertOperator(OpCode._CURLYX, offset); __programAddTail(offset, __emitNode(OpCode._NOTHING)); } if(min > 0) retFlags[0] = (__WORSTCASE | __NONNULL); if(max != 0 && max < min) throw new MalformedPatternException( "Invalid interval {" + min + "," + max + "}"); if(__program!= null) { __program[offset + 2] = (char)min; __program[offset + 3] = (char)max; } } if(__input._getValue() == '?') { __getNextChar(); __programInsertOperator(OpCode._MINMOD, offset); __programAddTail(offset, offset + 2); } if(__isComplexRepetitionOp(__input._array, __input._getOffset())) throw new MalformedPatternException( "Nested repetitions *?+ in expression"); return offset; } private int __parseExpression(boolean isParenthesized, int[] hintFlags) throws MalformedPatternException { char value, paren; char[] modifierFlags, posFlags = { 0 }, negFlags = { 0 }; int nodeOffset = OpCode._NULL_OFFSET, parenthesisNum = 0, br, ender; int[] flags = { 0 };; String modifiers = "iogmsx-"; modifierFlags = posFlags; // Initially we assume expression doesn't match null string. hintFlags[0] = __NONNULL; if (isParenthesized) { paren = 1; if(__input._getValue() == '?') { __input._increment(); paren = value = __input._postIncrement(); switch(value) { case ':' : case '=' : case '!' : break; case '#' : value = __input._getValue(); while(value != CharStringPointer._END_OF_STRING && value != ')') value = __input._increment(); if(value != ')') throw new MalformedPatternException( "Sequence (?#... not terminated"); __getNextChar(); hintFlags[0] = __TRYAGAIN; return OpCode._NULL_OFFSET; default : __input._decrement(); value = __input._getValue(); while(value != CharStringPointer._END_OF_STRING && modifiers.indexOf(value) != -1) { if(value == '-') modifierFlags = negFlags; else __setModifierFlag(modifierFlags, value); value = __input._increment(); } __modifierFlags[0] |= posFlags[0]; __modifierFlags[0] &= ~negFlags[0]; if(value != ')') throw new MalformedPatternException( "Sequence (?" + value + "...) not recognized"); __getNextChar(); hintFlags[0] = __TRYAGAIN; return OpCode._NULL_OFFSET; } } else { parenthesisNum = __numParentheses; ++__numParentheses; nodeOffset = __emitArgNode(OpCode._OPEN, (char)parenthesisNum); } } else paren = 0; br = __parseAlternation(flags); if(br == OpCode._NULL_OFFSET) return OpCode._NULL_OFFSET; if(nodeOffset != OpCode._NULL_OFFSET) __programAddTail(nodeOffset, br); else nodeOffset = br; if((flags[0] & __NONNULL) == 0) hintFlags[0] &= ~__NONNULL; hintFlags[0] |= (flags[0] & __SPSTART); while(__input._getValue() == '|') { __getNextChar(); br = __parseAlternation(flags); if(br == OpCode._NULL_OFFSET) return OpCode._NULL_OFFSET; __programAddTail(nodeOffset, br); if((flags[0] & __NONNULL) == 0) hintFlags[0] &= ~__NONNULL; hintFlags[0] |= (flags[0] & __SPSTART); } switch(paren) { case ':' : ender = __emitNode(OpCode._NOTHING); break; case 1: ender = __emitArgNode(OpCode._CLOSE, (char)parenthesisNum); break; case '=': case '!': ender = __emitNode(OpCode._SUCCEED); hintFlags[0] &= ~__NONNULL; break; case 0 : default : ender = __emitNode(OpCode._END); break; } __programAddTail(nodeOffset, ender); for(br = nodeOffset; br != OpCode._NULL_OFFSET; br = OpCode._getNext(__program, br)) __programAddOperatorTail(br, ender); if(paren == '=') { __programInsertOperator(OpCode._IFMATCH, nodeOffset); __programAddTail(nodeOffset, __emitNode(OpCode._NOTHING)); } else if(paren == '!') { __programInsertOperator(OpCode._UNLESSM, nodeOffset); __programAddTail(nodeOffset, __emitNode(OpCode._NOTHING)); } if(paren != 0 && (__input._isAtEnd() || __getNextChar() != ')')) { throw new MalformedPatternException("Unmatched parentheses."); } else if(paren == 0 && !__input._isAtEnd()) { if(__input._getValue() == ')') throw new MalformedPatternException("Unmatched parentheses."); else // Should never happen. throw new MalformedPatternException( "Unreached characters at end of expression. Please report this bug!"); } return nodeOffset; } /** * Compiles a Perl5 regular expression into a Perl5Pattern instance that * can be used by a Perl5Matcher object to perform pattern matching. * Please see the user's guide for more information about Perl5 regular * expressions. *
* @param pattern A Perl5 regular expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the regular expression. The flags * are a logical OR of any number of the five MASK * constants. For example: *
* regex = * compiler.compile(pattern, Perl5Compiler. * CASE_INSENSITIVE_MASK | * Perl5Compiler.MULTILINE_MASK); ** This says to compile the pattern so that it treats * input as consisting of multiple lines and to perform * matches in a case insensitive manner. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Perl5 regular expression. */ public Pattern compile(char[] pattern, int options) throws MalformedPatternException { int[] flags = { 0 }; int caseInsensitive, scan; Perl5Pattern regexp; String mustString, startString; int first; boolean sawOpen = false, sawPlus = false; StringBuffer lastLongest, longest; int length, minLength = 0, curBack, back, backmost; __input = new CharStringPointer(pattern); caseInsensitive = options & __CASE_INSENSITIVE; __modifierFlags[0] = (char)options; __sawBackreference = false; __numParentheses = 1; __programSize = 0; __cost = 0; __program= null; __emitCode((char)0); if(__parseExpression(false, flags) == OpCode._NULL_OFFSET) throw new MalformedPatternException("Unknown compilation error."); if(__programSize >= Character.MAX_VALUE - 1) throw new MalformedPatternException("Expression is too large."); __program= new char[__programSize]; regexp = new Perl5Pattern(); regexp._program = __program; regexp._expression = new String(pattern); __input._setOffset(0); __numParentheses = 1; __programSize = 0; __cost = 0; __emitCode((char)0); if(__parseExpression(false, flags) == OpCode._NULL_OFFSET) throw new MalformedPatternException("Unknown compilation error."); caseInsensitive = __modifierFlags[0] & __CASE_INSENSITIVE; regexp._isExpensive = (__cost >= 10); regexp._startClassOffset = OpCode._NULL_OFFSET; regexp._anchor = 0; regexp._back = -1; regexp._options = options; regexp._startString = null; regexp._mustString = null; mustString = null; startString = null; scan = 1; if(__program[OpCode._getNext(__program, scan)] == OpCode._END){ boolean doItAgain; // bad variables names! char op; first = scan = OpCode._getNextOperator(scan); op = __program[first]; while((op == OpCode._OPEN && (sawOpen = true)) || (op == OpCode._BRANCH && __program[OpCode._getNext(__program, first)] != OpCode._BRANCH) || op == OpCode._PLUS || op == OpCode._MINMOD || (OpCode._opType[op] == OpCode._CURLY && OpCode._getArg1(__program, first) > 0)) { if(op == OpCode._PLUS) sawPlus = true; else first+=OpCode._operandLength[op]; first = OpCode._getNextOperator(first); op = __program[first]; } doItAgain = true; while(doItAgain) { doItAgain = false; op = __program[first]; if(op == OpCode._EXACTLY) { startString = new String(__program, OpCode._getOperand(first + 1), __program[OpCode._getOperand(first)]); } else if(OpCode._isInArray(op, OpCode._opLengthOne, 2)) regexp._startClassOffset = first; else if(op == OpCode._BOUND || op == OpCode._NBOUND) regexp._startClassOffset = first; else if(OpCode._opType[op] == OpCode._BOL) { if(op == OpCode._BOL) regexp._anchor = Perl5Pattern._OPT_ANCH_BOL; else if(op == OpCode._MBOL) regexp._anchor = Perl5Pattern._OPT_ANCH_MBOL; else regexp._anchor = Perl5Pattern._OPT_ANCH; first = OpCode._getNextOperator(first); doItAgain = true; continue; } else if(op == OpCode._STAR && OpCode._opType[__program[OpCode._getNextOperator(first)]] == OpCode._ANY && (regexp._anchor & Perl5Pattern._OPT_ANCH) != 0) { regexp._anchor = Perl5Pattern._OPT_ANCH | Perl5Pattern._OPT_IMPLICIT; first = OpCode._getNextOperator(first); doItAgain = true; continue; } } // end while do it again if(sawPlus && (!sawOpen || !__sawBackreference)) regexp._anchor |= Perl5Pattern._OPT_SKIP; lastLongest = new StringBuffer(); longest = new StringBuffer(); length = 0; minLength = 0; curBack = 0; back = 0; backmost = 0; while(scan > 0 && (op = __program[scan]) != OpCode._END) { if(op == OpCode._BRANCH) { if(__program[OpCode._getNext(__program, scan)] == OpCode._BRANCH) { curBack = -30000; while(__program[scan] == OpCode._BRANCH) scan = OpCode._getNext(__program, scan); } else scan = OpCode._getNextOperator(scan); continue; } if(op == OpCode._UNLESSM) { curBack = -30000; scan = OpCode._getNext(__program, scan); continue; } if(op == OpCode._EXACTLY) { int temp; first = scan; while(__program[(temp = OpCode._getNext(__program, scan))] == OpCode._CLOSE) scan = temp; minLength += __program[OpCode._getOperand(first)]; temp = __program[OpCode._getOperand(first)]; if(curBack - back == length) { lastLongest.append(new String(__program, OpCode._getOperand(first) + 1, temp)); length += temp; curBack += temp; first = OpCode._getNext(__program, scan); } else if(temp >= (length + (curBack >= 0 ? 1 : 0))) { length = temp; lastLongest = new StringBuffer(new String(__program, OpCode._getOperand(first) + 1, temp)); back = curBack; curBack += length; first = OpCode._getNext(__program, scan); } else curBack += temp; } else if(OpCode._isInArray(op, OpCode._opLengthVaries, 0)) { curBack = -30000; length = 0; if(lastLongest.length() > longest.length()) { longest = lastLongest; backmost = back; } lastLongest = new StringBuffer(); if(op == OpCode._PLUS && OpCode._isInArray(__program[OpCode._getNextOperator(scan)], OpCode._opLengthOne, 0)) ++minLength; else if(OpCode._opType[op] == OpCode._CURLY && OpCode._isInArray(__program[OpCode._getNextOperator(scan) + 2], OpCode._opLengthOne, 0)) minLength += OpCode._getArg1(__program, scan); } else if(OpCode._isInArray(op, OpCode._opLengthOne, 0)) { ++curBack; ++minLength; length = 0; if(lastLongest.length() > longest.length()) { longest = lastLongest; backmost = back; } lastLongest = new StringBuffer(); } scan = OpCode._getNext(__program, scan); } // end while if(lastLongest.length() + ((OpCode._opType[__program[first]] == OpCode._EOL) ? 1 : 0) > longest.length()) { longest = lastLongest; backmost = back; } else lastLongest = new StringBuffer(); if(longest.length() > 0 && startString == null) { mustString = longest.toString(); if(backmost < 0) backmost = -1; regexp._back = backmost; /* if(longest.length() > (((caseInsensitive & __CASE_INSENSITIVE) != 0 || OpCode._opType[__program[first]] == OpCode._EOL) ? 1 : 0)) */ } else longest = null; } // end if regexp._isCaseInsensitive = ((caseInsensitive & __CASE_INSENSITIVE) != 0); regexp._numParentheses = __numParentheses - 1; regexp._minLength = minLength; if(mustString != null) { regexp._mustString = mustString.toCharArray(); regexp._mustUtility = 100; } if(startString != null) regexp._startString = startString.toCharArray(); return regexp; } /** * Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Perl5 regular expression. */ public Pattern compile(char[] pattern) throws MalformedPatternException { return compile(pattern, DEFAULT_MASK); } /** * Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Perl5 regular expression. */ public Pattern compile(String pattern) throws MalformedPatternException { return compile(pattern.toCharArray(), DEFAULT_MASK); } /** * Compiles a Perl5 regular expression into a Perl5Pattern instance that * can be used by a Perl5Matcher object to perform pattern matching. * Please see the user's guide for more information about Perl5 regular * expressions. *
* @param pattern A Perl5 regular expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the regular expression. The flags * are a logical OR of any number of the five MASK * constants. For example: *
* regex = * compiler.compile("^\\w+\\d+$", * Perl5Compiler.CASE_INSENSITIVE_MASK | * Perl5Compiler.MULTILINE_MASK); ** This says to compile the pattern so that it treats * input as consisting of multiple lines and to perform * matches in a case insensitive manner. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Perl5 regular expression. */ public Pattern compile(String pattern, int options) throws MalformedPatternException { return compile(pattern.toCharArray(), options); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Repetition.java 0000644 0001750 0001750 00000006230 07773723336 025575 0 ustar arnaud arnaud /* * $Id: Perl5Repetition.java,v 1.7 2003/11/07 20:16:25 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see *
match
.
* Pattern matching methods that do not match subgroups, will only contain
* entries for group 0, which always refers to the entire pattern.
* beginGroupOffset
contains the start offset of the groups,
* indexed by group number, which will always be 0 for group 0.
* endGroupOffset
contains the ending offset + 1 of the groups.
* A group matching the null string will have beginGroupOffset
* and endGroupOffset
entries of equal value. Following a
* convention established by the GNU regular expression library for the
* C language, groups that are not part of a match contain -1 as their
* begin and end offsets.
*/
int[] _beginGroupOffset, _endGroupOffset;
/**
* The entire string that matched the pattern.
*/
String _match;
/**
* Constructs a MatchResult able to store match information for
* a number of subpattern groups.
* * @param groups The number of groups this MatchResult can store. * Only postitive values greater than or equal to 1 make any * sense. At minimum, a MatchResult stores one group which * represents the entire pattern matched including all subparts. */ Perl5MatchResult(int groups){ _beginGroupOffset = new int[groups]; _endGroupOffset = new int[groups]; } /** * @return The length of the match. */ public int length(){ int length; length = (_endGroupOffset[0] - _beginGroupOffset[0]); return (length > 0 ? length : 0); } /** * @return The number of groups contained in the result. This number * includes the 0th group. In other words, the result refers * to the number of parenthesized subgroups plus the entire match * itself. */ public int groups(){ return _beginGroupOffset.length; } /** * @param group The pattern subgroup to return. * @return A string containing the indicated pattern subgroup. Group * 0 always refers to the entire match. If a group was never * matched, it returns null. This is not to be confused with * a group matching the null string, which will return a String * of length 0. */ public String group(int group){ int begin, end, length; if(group < _beginGroupOffset.length){ begin = _beginGroupOffset[group]; end = _endGroupOffset[group]; length = _match.length(); if(begin >= 0 && end >= 0) { if(begin < length && end <= length && end > begin) return _match.substring(begin, end); else if(begin <= end) return ""; } } return null; } /** * @param group The pattern subgroup. * @return The offset into group 0 of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. */ public int begin(int group){ int begin, end;//, length; if(group < _beginGroupOffset.length){ begin = _beginGroupOffset[group]; end = _endGroupOffset[group]; //length = _match.length(); if(begin >= 0 && end >= 0)// && begin < length && end <= length) //return _beginGroupOffset[group]; return begin; } return -1; } /** * @param group The pattern subgroup. * @return Returns one plus the offset into group 0 of the last token in * the indicated pattern subgroup. If a group was never matched * or does not exist, returns -1. A group matching the null * string will return its start offset. */ public int end(int group){ int begin, end; //, length; if(group < _beginGroupOffset.length){ begin = _beginGroupOffset[group]; end = _endGroupOffset[group]; //length = _match.length(); if(begin >= 0 && end >= 0)// && begin < length && end <= length) //return _endGroupOffset[group]; return end; } return -1; } /** * Returns an offset marking the beginning of the pattern match * relative to the beginning of the input. *
* @param group The pattern subgroup. * @return The offset of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. */ public int beginOffset(int group){ int begin, end;//, length; if(group < _beginGroupOffset.length){ begin = _beginGroupOffset[group]; end = _endGroupOffset[group]; //length = _match.length(); if(begin >= 0 && end >= 0)// && begin < length && end <= length) //return _matchBeginOffset + _beginGroupOffset[group]; return _matchBeginOffset + begin; } return -1; } /** * Returns an offset marking the end of the pattern match * relative to the beginning of the input. *
* @param group The pattern subgroup.
* @return Returns one plus the offset of the last token in
* the indicated pattern subgroup. If a group was never matched
* or does not exist, returns -1. A group matching the null
* string will return its start offset.
*/
public int endOffset(int group){
int begin, end;//, length;
if(group < _endGroupOffset.length){
begin = _beginGroupOffset[group];
end = _endGroupOffset[group];
//length = _match.length();
if(begin >= 0 && end >= 0)// && begin < length && end <= length)
//return _matchBeginOffset + _endGroupOffset[group];
return _matchBeginOffset + end;
}
return -1;
}
/**
* The same as group(0).
*
* @return A string containing the entire match.
*/
public String toString() {
return group(0);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Substitution.java 0000644 0001750 0001750 00000013357 07773723336 025267 0 ustar arnaud arnaud /*
* $Id: Substitution.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* For performance reasons, rather than provide a getSubstitution method * that returns a String used by Util.substitute, we have opted to pass * a StringBuffer argument from Util.substitute to which the Substitution * must append data. The contract that an appendSubstitution * implementation must abide by is that the appendBuffer may only be * appended to. appendSubstitution() may not alter the appendBuffer in * any way other than appending to it. *
* This method is invoked by Util.substitute every time it finds a match. * After finding a match, Util.substitute appends to the appendBuffer * all of the original input occuring between the end of the last match * and the beginning of the current match. Then it invokes * appendSubstitution(), passing the appendBuffer, current match, and * other information as arguments. The substitutionCount keeps track * of how many substitutions have been performed so far by an invocation * of Util.substitute. Its value starts at 1 when the first substitution * is found and appendSubstitution is invoked for the first time. It * will NEVER be zero or a negative value. *
* @param appendBuffer The buffer containing the new string resulting
* from performing substitutions on the original input.
* @param match The current match causing a substitution to be made.
* @param substitutionCount The number of substitutions that have been
* performed so far by Util.substitute.
* @param originalInput The original input upon which the substitutions are
* being performed. The Substitution must treat this parameter as read only.
* @param matcher The PatternMatcher used to find the current match.
* @param pattern The Pattern used to find the current match.
*/
public void appendSubstitution(StringBuffer appendBuffer, MatchResult match,
int substitutionCount,
PatternMatcherInput originalInput,
PatternMatcher matcher, Pattern pattern);
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/PatternMatcherInput.java 0000644 0001750 0001750 00000044536 07773723336 026517 0 ustar arnaud arnaud /*
* $Id: PatternMatcherInput.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
* contains()
methods of PatternMatcher instances.
* It is also used to specify that only a subregion of a string
* should be used as input when looking for a pattern match. All that
* is meant by preserving state is that the end offset of the last match
* is remembered, so that the next match is performed from that point
* where the last match left off. This offset can be accessed from
* the {@link #getCurrentOffset()} method and can be set with the
* {@link #setCurrentOffset(int)} method.
*
* You would use a PatternMatcherInput object when you want to search for * more than just the first occurrence of a pattern in a string, or when * you only want to search a subregion of the string for a match. An * example of its most common use is: *
* * @version @version@ * @since 1.0 * @see PatternMatcher */ public final class PatternMatcherInput { String _originalStringInput; char[] _originalCharInput, _originalBuffer, _toLowerBuffer; int _beginOffset, _endOffset, _currentOffset; int _matchBeginOffset = -1, _matchEndOffset = -1; /** * Creates a PatternMatcherInput object, associating a region of a String * as input to be used for pattern matching by PatternMatcher objects. * A copy of the string is not made, therefore you should not modify * the string unless you know what you are doing. * The current offset of the PatternMatcherInput is set to the begin * offset of the region. ** PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * PatternMatcherInput input; * MatchResult result; * * compiler = new Perl5Compiler(); * matcher = new Perl5Matcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.out.println("Bad pattern."); * System.out.println(e.getMessage()); * return; * } * * input = new PatternMatcherInput(someStringInput); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * } * // Suppose we want to start searching from the beginning again with * // a different pattern. * // Just set the current offset to the begin offset. * input.setCurrentOffset(input.getBeginOffset()); * * // Second search omitted * * // Suppose we're done with this input, but want to search another string. * // There's no need to create another PatternMatcherInput instance. * // We can just use the setInput() method. * input.setInput(aNewInputString); * *
* @param input The input to associate with the PatternMatcherInput. * @param begin The offset into the char[] to use as the beginning of * the input. * @param length The length of the reegion starting from the begin offset * to use as the input for pattern matching purposes. */ public PatternMatcherInput(String input, int begin, int length) { setInput(input, begin, length); } /** * Like calling *
** PatternMatcherInput(input, 0, input.length()); *
* @param input The input to associate with the PatternMatcherInput. */ public PatternMatcherInput(String input) { this(input, 0, input.length()); } /** * Creates a PatternMatcherInput object, associating a region of a string * (represented as a char[]) as input * to be used for pattern matching by PatternMatcher objects. * A copy of the string is not made, therefore you should not modify * the string unless you know what you are doing. * The current offset of the PatternMatcherInput is set to the begin * offset of the region. *
* @param input The input to associate with the PatternMatcherInput. * @param begin The offset into the char[] to use as the beginning of * the input. * @param length The length of the reegion starting from the begin offset * to use as the input for pattern matching purposes. */ public PatternMatcherInput(char[] input, int begin, int length) { setInput(input, begin, length); } /** * Like calling: *
** PatternMatcherInput(input, 0, input.length); *
* @param input The input to associate with the PatternMatcherInput. */ public PatternMatcherInput(char[] input) { this(input, 0, input.length); } /** * @return The length of the region to be considered input for pattern * matching purposes. Essentially this is then end offset minus * the begin offset. */ public int length() { return (_endOffset - _beginOffset); //return _originalBuffer.length; } /** * Associates a region of a String as input * to be used for pattern matching by PatternMatcher objects. * The current offset of the PatternMatcherInput is set to the begin * offset of the region. *
* @param input The input to associate with the PatternMatcherInput. * @param begin The offset into the String to use as the beginning of * the input. * @param length The length of the reegion starting from the begin offset * to use as the input for pattern matching purposes. */ public void setInput(String input, int begin, int length) { _originalStringInput = input; _originalCharInput = null; _toLowerBuffer = null; _originalBuffer = input.toCharArray(); setCurrentOffset(begin); setBeginOffset(begin); setEndOffset(_beginOffset + length); } /** * This method is identical to calling: *
** setInput(input, 0, input.length()); *
* @param input The input to associate with the PatternMatcherInput. */ public void setInput(String input) { setInput(input, 0, input.length()); } /** * Associates a region of a string (represented as a char[]) as input * to be used for pattern matching by PatternMatcher objects. * A copy of the string is not made, therefore you should not modify * the string unless you know what you are doing. * The current offset of the PatternMatcherInput is set to the begin * offset of the region. *
* @param input The input to associate with the PatternMatcherInput. * @param begin The offset into the char[] to use as the beginning of * the input. * @param length The length of the reegion starting from the begin offset * to use as the input for pattern matching purposes. */ public void setInput(char[] input, int begin, int length) { _originalStringInput = null; _toLowerBuffer = null; _originalBuffer = _originalCharInput = input; setCurrentOffset(begin); setBeginOffset(begin); setEndOffset(_beginOffset + length); } /** * This method is identical to calling: *
** setInput(input, 0, input.length); *
* @param input The input to associate with the PatternMatcherInput. */ public void setInput(char[] input) { setInput(input, 0, input.length); } /** * Returns the character at a particular offset relative to the begin * offset of the input. *
* @param offset The offset at which to fetch a character (relative to * the beginning offset. * @return The character at a particular offset. * @exception ArrayIndexOutOfBoundsException If the offset does not occur * within the bounds of the input. */ public char charAt(int offset) { return _originalBuffer[_beginOffset + offset]; } /** * Returns a new string that is a substring of the PatternMatcherInput * instance. The substring begins at the specified beginOffset relative * to the begin offset and extends to the specified endOffset - 1 * relative to the begin offset of the PatternMatcherInput instance. *
* @param beginOffset The offset relative to the begin offset of the * PatternMatcherInput at which to start the substring (inclusive). * @param endOffset The offset relative to the begin offset of the * PatternMatcherInput at which to end the substring (exclusive). * @return The specified substring. * @exception ArrayIndexOutOfBoundsException If one of the offsets does * not occur within the bounds of the input. */ public String substring(int beginOffset, int endOffset) { return new String(_originalBuffer, _beginOffset+beginOffset, endOffset - beginOffset); } /** * Returns a new string that is a substring of the PatternMatcherInput * instance. The substring begins at the specified beginOffset relative * to the begin offset and extends to the end offset of the * PatternMatcherInput. *
* @param beginOffset The offset relative to the begin offset of the * PatternMatcherInput at which to start the substring. * @return The specified substring. * @exception ArrayIndexOutOfBoundsException If the offset does not occur * within the bounds of the input. */ public String substring(int beginOffset) { beginOffset+=_beginOffset; return new String(_originalBuffer, beginOffset, _endOffset - beginOffset); } /** * Retrieves the original input used to initialize the PatternMatcherInput * instance. If a String was used, the String instance will be returned. * If a char[] was used, a char instance will be returned. This violates * data encapsulation and hiding principles, but it is a great convenience * for the programmer. *
* @return The String or char[] input used to initialize the * PatternMatcherInput instance. */ public Object getInput(){ if(_originalStringInput == null) return _originalCharInput; return _originalStringInput; } /** * Retrieves the char[] buffer to be used used as input by PatternMatcher * implementations to look for matches. This array should be treated * as read only by the programmer. *
* @return The char[] buffer to be used as input by PatternMatcher * implementations. */ public char[] getBuffer() { return _originalBuffer; } /** * Returns whether or not the end of the input has been reached. *
* @return True if the current offset is greater than or equal to the * end offset. */ public boolean endOfInput(){ return (_currentOffset >= _endOffset); } /** * @return The offset of the input that should be considered the start * of the region to be considered as input by PatternMatcher * methods. */ public int getBeginOffset() { return _beginOffset; } /** * @return The offset of the input that should be considered the end * of the region to be considered as input by PatternMatcher * methods. This offset is actually 1 plus the last offset * that is part of the input region. */ public int getEndOffset() { return _endOffset; } /** * @return The offset of the input that should be considered the current * offset where PatternMatcher methods should start looking for * matches. */ public int getCurrentOffset() { return _currentOffset; } /** * Sets the offset of the input that should be considered the start * of the region to be considered as input by PatternMatcher * methods. In other words, everything before this offset is ignored * by a PatternMatcher. *
* @param offset The offset to use as the beginning of the input. */ public void setBeginOffset(int offset) { _beginOffset = offset; } /** * Sets the offset of the input that should be considered the end * of the region to be considered as input by PatternMatcher * methods. This offset is actually 1 plus the last offset * that is part of the input region. *
* @param offset The offset to use as the end of the input. */ public void setEndOffset(int offset) { _endOffset = offset; } /** * Sets the offset of the input that should be considered the current * offset where PatternMatcher methods should start looking for * matches. Also resets all match offset information to -1. By calling * this method, you invalidate all previous match information. Therefore * a PatternMatcher implementation must call this method before setting * match offset information. *
* @param offset The offset to use as the current offset. */ public void setCurrentOffset(int offset) { _currentOffset = offset; setMatchOffsets(-1, -1); } /** * Returns the string representation of the input, where the input is * considered to start from the begin offset and end at the end offset. *
* @return The string representation of the input. */ public String toString() { return new String(_originalBuffer, _beginOffset, length()); } /** * A convenience method returning the part of the input occurring before * the last match found by a call to a Perl5Matcher * {@link Perl5Matcher#contains contains} method. *
* @return The input preceeding a match. */ public String preMatch() { return new String(_originalBuffer, _beginOffset, _matchBeginOffset - _beginOffset); } /** * A convenience method returning the part of the input occurring after * the last match found by a call to a Perl5Matcher * {@link Perl5Matcher#contains contains} method. *
* @return The input succeeding a contains() match. */ public String postMatch() { return new String(_originalBuffer, _matchEndOffset, _endOffset - _matchEndOffset); } /** * A convenience method returning the part of the input corresponding * to the last match found by a call to a Perl5Matcher * {@link Perl5Matcher#contains contains} method. * The method is not called getMatch() so as not to confuse it * with Perl5Matcher's getMatch() which returns a MatchResult instance * and also for consistency with preMatch() and postMatch(). *
* @return The input consisting of the match found by contains(). */ public String match() { return new String(_originalBuffer, _matchBeginOffset, _matchEndOffset - _matchBeginOffset); } /** * This method is intended for use by PatternMatcher implementations. * It is necessary to record the location of the previous match so that * consecutive contains() matches involving null string matches are * properly handled. If you are not implementing a PatternMatcher, forget * this method exists. If you use it outside of its intended context, you * will only disrupt the stored state. *
* As a note, the preMatch(), postMatch(), and match() methods are provided * as conveniences because PatternMatcherInput must store match offset * information to completely preserve state for consecutive PatternMatcher * contains() matches. *
* @param matchBeginOffset The begin offset of a match found by contains(). * @param matchEndOffset The end offset of a match found by contains(). */ public void setMatchOffsets(int matchBeginOffset, int matchEndOffset) { _matchBeginOffset = matchBeginOffset; _matchEndOffset = matchEndOffset; } /** * Returns the offset marking the beginning of the match found by * contains(). *
* @return The begin offset of a contains() match. */ public int getMatchBeginOffset() { return _matchBeginOffset; } /** * Returns the offset marking the end of the match found by contains(). *
* @return The end offset of a contains() match.
*/
public int getMatchEndOffset() { return _matchEndOffset; }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Pattern.java 0000644 0001750 0001750 00000012164 07773723336 025073 0 ustar arnaud arnaud /*
* $Id: Perl5Pattern.java,v 1.8 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @return The original string representation of the regular expression * pattern. */ public String getPattern() { return _expression; } /** * This method returns an integer containing the compilation options used * to compile this pattern. *
* @return The compilation options used to compile the pattern.
*/
public int getOptions() { return _options; }
/*
// For testing
public String toString() {
return "Parens: " + _numParentheses + " " + _beginMatchOffsets.length + " "
+ _endMatchOffsets.length;
}
*/
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/PatternCompiler.java 0000644 0001750 0001750 00000016644 07773723336 025665 0 ustar arnaud arnaud /*
* $Id: PatternCompiler.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* A PatternCompiler instance is used to compile the string representation * (either as a String or char[]) of a regular expression into a Pattern * instance. The Pattern can then be used in conjunction with the appropriate * PatternMatcher instance to perform pattern searches. A form * of use might be: *
*
** PatternCompiler compiler; * PatternMatcher matcher; * Pattern pattern; * String input; * * // Initialization of compiler, matcher, and input omitted; * * try { * pattern = compiler.compile("\\d+"); * } catch(MalformedPatternException e) { * System.out.println("Bad pattern."); * System.out.println(e.getMessage()); * System.exit(1); * } * * * if(matcher.matches(input, pattern)) * System.out.println(input + " is a number"); * else * System.out.println(input + " is not a number"); * *
* Specific PatternCompiler implementations such as Perl5Compiler may have * variations of the compile() methods that take extra options affecting * the compilation of a pattern. However, the PatternCompiler method * implementations should provide the default behavior of the class. * * @version @version@ * @since 1.0 * @see Pattern * @see PatternMatcher * @see MalformedPatternException */ public interface PatternCompiler { /** * Compiles a regular expression into a data structure that can be used * by a PatternMatcher implementation to perform pattern matching. *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * @exception MalformedPatternException If the compiled expression * does not conform to the grammar understood by the PatternCompiler or * if some other error in the expression is encountered. */ public Pattern compile(String pattern) throws MalformedPatternException; /** * Compiles a regular expression into a data structure that can be * used by a PatternMatcher implementation to perform pattern matching. * Additional regular expression syntax specific options can be passed * as a bitmask of options. *
* @param pattern A regular expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the regular expression. The flags * are a logical OR of any number of the allowable * constants permitted by the PatternCompiler * implementation. * @return A Pattern instance constituting the compiled regular expression. * @exception MalformedPatternException If the compiled expression * does not conform to the grammar understood by the PatternCompiler or * if some other error in the expression is encountered. */ public Pattern compile(String pattern, int options) throws MalformedPatternException; /** * Compiles a regular expression into a data structure that can be used * by a PatternMatcher implementation to perform pattern matching. *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * @exception MalformedPatternException If the compiled expression * does not conform to the grammar understood by the PatternCompiler or * if some other error in the expression is encountered. */ public Pattern compile(char[] pattern) throws MalformedPatternException; /** * Compiles a regular expression into a data structure that can be * used by a PatternMatcher implementation to perform pattern matching. * Additional regular expression syntax specific options can be passed * as a bitmask of options. *
* @param pattern A regular expression to compile.
* @param options A set of flags giving the compiler instructions on
* how to treat the regular expression. The flags
* are a logical OR of any number of the allowable
* constants permitted by the PatternCompiler
* implementation.
* @return A Pattern instance constituting the compiled regular expression.
* @exception MalformedPatternException If the compiled expression
* does not conform to the grammar understood by the PatternCompiler or
* if some other error in the expression is encountered.
*/
public Pattern compile(char[] pattern, int options)
throws MalformedPatternException;
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/MalformedPatternException.java 0000644 0001750 0001750 00000007334 07773723336 027674 0 ustar arnaud arnaud /*
* $Id: MalformedPatternException.java,v 1.8 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param message A message indicating the nature of the parse error.
*/
public MalformedPatternException(String message) {
super(message);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/OpCode.java 0000644 0001750 0001750 00000021172 07773723336 023716 0 ustar arnaud arnaud /*
* $Id: OpCode.java,v 1.11 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* * A MatchResult instance contains a pattern match and its saved groups. * You can access the entire match directly using the * {@link #group(int)} method with an argument of 0, * or by the {@link #toString()} method which is * defined to return the same thing. It is also possible to obtain * the beginning and ending offsets of a match relative to the input * producing the match by using the * {@link #beginOffset(int)} and {@link #endOffset(int)} methods. The * {@link #begin(int)} and {@link #end(int)} are useful in some * circumstances and return the begin and end offsets of the subgroups * of a match relative to the beginning of the match. *
* * You might use a MatchResult as follows: *
* * @version @version@ * @since 1.0 * @see PatternMatcher */ public interface MatchResult { /** * A convenience method returning the length of the entire match. * If you want to get the length of a particular subgroup you should * use the {@link #group(int)} method to get * the string and then access its length() method as follows: ** int groups; * PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * PatternMatcherInput input; * MatchResult result; * * compiler = new Perl5Compiler(); * matcher = new Perl5Matcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.out.println("Bad pattern."); * System.out.println(e.getMessage()); * return; * } * * input = new PatternMatcherInput(someStringInput); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * // Here we just print out all its elements to show how its * // methods are used. * * System.out.println("Match: " + result.toString()); * System.out.println("Length: " + result.length()); * groups = result.groups(); * System.out.println("Groups: " + groups); * System.out.println("Begin offset: " + result.beginOffset(0)); * System.out.println("End offset: " + result.endOffset(0)); * System.out.println("Saved Groups: "); * * // Start at 1 because we just printed out group 0 * for(int group = 1; group < groups; group++) { * System.out.println(group + ": " + result.group(group)); * System.out.println("Begin: " + result.begin(group)); * System.out.println("End: " + result.end(group)); * } * } *
*
** int length = -1; // Use -1 to indicate group doesn't exist * MatchResult result; * String subgroup; * * // Initialization of result omitted * * subgroup = result.group(1); * if(subgroup != null) * length = subgroup.length(); * *
* * The length() method serves as a more a more efficient way to do: *
*
** length = result.group(0).length(); *
*
* @return The length of the match.
*/
public int length();
/**
* @return The number of groups contained in the result. This number
* includes the 0th group. In other words, the result refers
* to the number of parenthesized subgroups plus the entire match
* itself.
*/
public int groups();
/**
* Returns the contents of the parenthesized subgroups of a match,
* counting parentheses from left to right and starting from 1.
* Group 0 always refers to the entire match. For example, if the
* pattern foo(\d+)
is used to extract a match
* from the input abfoo123
, then group(0)
* will return foo123
and group(1)
will return
* 123
. group(2)
will return
* null
because there is only one subgroup in the original
* pattern.
*
* @param group The pattern subgroup to return. * @return A string containing the indicated pattern subgroup. Group * 0 always refers to the entire match. If a group was never * matched, it returns null. This is not to be confused with * a group matching the null string, which will return a String * of length 0. */ public String group(int group); /** * @param group The pattern subgroup. * @return The offset into group 0 of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. Be aware that a group that matches * the null string at the end of a match will have an offset * equal to the length of the string, so you shouldn't blindly * use the offset to index an array or String. */ public int begin(int group); /** * @param group The pattern subgroup. * @return Returns one plus the offset into group 0 of the last token in * the indicated pattern subgroup. If a group was never matched * or does not exist, returns -1. A group matching the null * string will return its start offset. */ public int end(int group); /** * Returns an offset marking the beginning of the pattern match * relative to the beginning of the input from which the match * was extracted. *
* @param group The pattern subgroup. * @return The offset of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. */ public int beginOffset(int group); /** * Returns an offset marking the end of the pattern match * relative to the beginning of the input from which the match was * extracted. *
* @param group The pattern subgroup.
* @return Returns one plus the offset of the last token in
* the indicated pattern subgroup. If a group was never matched
* or does not exist, returns -1. A group matching the null
* string will return its start offset.
*/
public int endOffset(int group);
/**
* Returns the same as group(0).
*
* @return A string containing the entire match.
*/
public String toString();
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Debug.java 0000644 0001750 0001750 00000023315 07773723336 024504 0 ustar arnaud arnaud /*
* $Id: Perl5Debug.java,v 1.11 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param regexp The Perl5Pattern to print. * @return A string representation of the bytecode program defining the * regular expression. */ public static String printProgram(Perl5Pattern regexp) { StringBuffer buffer; char operator = OpCode._OPEN, prog[]; int offset, next; prog = regexp._program; offset = 1; buffer = new StringBuffer(); while(operator != OpCode._END) { operator = prog[offset]; buffer.append(offset); _printOperator(prog, offset, buffer); next = OpCode._getNext(prog, offset); offset+=OpCode._operandLength[operator]; buffer.append("(" + next + ")"); offset+=2; if(operator == OpCode._ANYOF) { offset += 16; } else if(operator == OpCode._ANYOFUN || operator == OpCode._NANYOFUN) { while(prog[offset] != OpCode._END) { if(prog[offset] == OpCode._RANGE) offset+=3; else offset+=2; } ++offset; } else if(operator == OpCode._EXACTLY) { ++offset; buffer.append(" <"); //while(prog[offset] != '0') while(prog[offset] != CharStringPointer._END_OF_STRING) { //while(prog[offset] != 0 && // prog[offset] != CharStringPointer._END_OF_STRING) { buffer.append(prog[offset]); ++offset; } buffer.append(">"); ++offset; } buffer.append('\n'); } // Can print some other stuff here. if(regexp._startString != null) buffer.append("start `" + new String(regexp._startString) + "' "); if(regexp._startClassOffset != OpCode._NULL_OFFSET) { buffer.append("stclass `"); _printOperator(prog, regexp._startClassOffset, buffer); buffer.append("' "); } if((regexp._anchor & Perl5Pattern._OPT_ANCH) != 0) buffer.append("anchored "); if((regexp._anchor & Perl5Pattern._OPT_SKIP) != 0) buffer.append("plus "); if((regexp._anchor & Perl5Pattern._OPT_IMPLICIT) != 0) buffer.append("implicit "); if(regexp._mustString != null) buffer.append("must have \""+ new String(regexp._mustString) + "\" back " + regexp._back + " "); buffer.append("minlen " + regexp._minLength + '\n'); return buffer.toString(); } static void _printOperator(char[] program, int offset, StringBuffer buffer) { String str = null; buffer.append(":"); switch(program[offset]) { case OpCode._BOL : str = "BOL"; break; case OpCode._MBOL : str = "MBOL"; break; case OpCode._SBOL : str = "SBOL"; break; case OpCode._EOL : str = "EOL"; break; case OpCode._MEOL : str = "MEOL"; break; case OpCode._ANY : str = "ANY"; break; case OpCode._SANY : str = "SANY"; break; case OpCode._ANYOF : str = "ANYOF"; break; case OpCode._ANYOFUN : str = "ANYOFUN"; break; case OpCode._NANYOFUN : str = "NANYOFUN"; break; /* case OpCode._ANYOF : // debug buffer.append("ANYOF\n\n"); int foo = OpCode._OPERAND(offset); char ch; for(ch=0; ch < 256; ch++) { if(ch % 16 == 0) buffer.append(" "); buffer.append((program[foo + (ch >> 4)] & (1 << (ch & 0xf))) == 0 ? 0 : 1); } buffer.append("\n\n"); break; */ case OpCode._BRANCH : str = "BRANCH"; break; case OpCode._EXACTLY : str = "EXACTLY"; break; case OpCode._NOTHING : str = "NOTHING"; break; case OpCode._BACK : str = "BACK"; break; case OpCode._END : str = "END"; break; case OpCode._ALNUM : str = "ALNUM"; break; case OpCode._NALNUM : str = "NALNUM"; break; case OpCode._BOUND : str = "BOUND"; break; case OpCode._NBOUND : str = "NBOUND"; break; case OpCode._SPACE : str = "SPACE"; break; case OpCode._NSPACE : str = "NSPACE"; break; case OpCode._DIGIT : str = "DIGIT"; break; case OpCode._NDIGIT : str = "NDIGIT"; break; case OpCode._ALPHA : str = "ALPHA"; break; case OpCode._BLANK : str = "BLANK"; break; case OpCode._CNTRL : str = "CNTRL"; break; case OpCode._GRAPH : str = "GRAPH"; break; case OpCode._LOWER : str = "LOWER"; break; case OpCode._PRINT : str = "PRINT"; break; case OpCode._PUNCT : str = "PUNCT"; break; case OpCode._UPPER : str = "UPPER"; break; case OpCode._XDIGIT : str = "XDIGIT"; break; case OpCode._ALNUMC : str = "ALNUMC"; break; case OpCode._ASCII : str = "ASCII"; break; case OpCode._CURLY : buffer.append("CURLY {"); buffer.append((int)OpCode._getArg1(program, offset)); buffer.append(','); buffer.append((int)OpCode._getArg2(program, offset)); buffer.append('}'); break; case OpCode._CURLYX: buffer.append("CURLYX {"); buffer.append((int)OpCode._getArg1(program, offset)); buffer.append(','); buffer.append((int)OpCode._getArg2(program, offset)); buffer.append('}'); break; case OpCode._REF: buffer.append("REF"); buffer.append((int)OpCode._getArg1(program, offset)); break; case OpCode._OPEN: buffer.append("OPEN"); buffer.append((int)OpCode._getArg1(program, offset)); break; case OpCode._CLOSE: buffer.append("CLOSE"); buffer.append((int)OpCode._getArg1(program, offset)); break; case OpCode._STAR : str = "STAR"; break; case OpCode._PLUS : str = "PLUS"; break; case OpCode._MINMOD : str = "MINMOD"; break; case OpCode._GBOL : str = "GBOL"; break; case OpCode._UNLESSM: str = "UNLESSM"; break; case OpCode._IFMATCH: str = "IFMATCH"; break; case OpCode._SUCCEED: str = "SUCCEED"; break; case OpCode._WHILEM : str = "WHILEM"; break; default: buffer.append("Operator is unrecognized. Faulty expression code!"); break; } if(str != null) buffer.append(str); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/package.html 0000644 0001750 0001750 00000012714 07773723336 024165 0 ustar arnaud arnaud
This package used to be the OROMatcher library and provides both generic regular expression interfaces and Perl5 regular expression compatible implementation classes.Note: The following information will be moved into the user's guide.
Here we summarize the syntax of Perl5.003 regular expressions, all of
which is supported by the Perl5 classes in this package. However, for
a definitive reference, you should consult the
perlre
man page
that accompanies the Perl5 distribution and also the book
Programming Perl, 2nd Edition from O'Reilly & Associates.
We are working toward implementing the features added after Perl5.003
up to and including Perl 5.6. Please remember, we only guarantee
support for Perl5.003 expressions in version 2.0.
By default, a quantified subpattern is greedy . In other words it matches as many times as possible without causing the rest of the pattern not to match. To change the quantifiers to match the minimum number of times possible, without causing the rest of the pattern not to match, you may use a "?" right after the quantifier.
Perl5 extended regular expressions are fully supported.
* @return The original string representation of the regular expression * pattern. */ public String getPattern(); /** * This method returns an integer containing the compilation options used * to compile this pattern. *
* @return The compilation options used to compile the pattern.
*/
public int getOptions();
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/CharStringPointer.java 0000644 0001750 0001750 00000010754 07773723336 026156 0 ustar arnaud arnaud /*
* $Id: CharStringPointer.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param substitution The string to use as a substitution. */ public StringSubstitution(String substitution) { setSubstitution(substitution); } /** * Sets the substitution represented by this StringSubstitution. You * should use this method in order to avoid repeatedly allocating new * StringSubstitutions. It is recommended that you allocate a single * StringSubstitution and reuse it by using this method when appropriate. *
* @param substitution The string to use as a substitution. */ public void setSubstitution(String substitution) { _substitution = substitution; _subLength = substitution.length(); } /** * Returns the string substitution represented by this object. *
* @return The string substitution represented by this object. */ public String getSubstitution() { return _substitution; } /** * Returns the same value as {@link #getSubstitution()}. *
* @return The string substitution represented by this object. */ public String toString() { return getSubstitution(); } /** * Appends the substitution to a buffer containing the original input * with substitutions applied for the pattern matches found so far. * See * {@link Substitution#appendSubstitution Substitution.appendSubstition()} * for more details regarding the expected behavior of this method. *
* @param appendBuffer The buffer containing the new string resulting
* from performing substitutions on the original input.
* @param match The current match causing a substitution to be made.
* @param substitutionCount The number of substitutions that have been
* performed so far by Util.substitute.
* @param originalInput The original input upon which the substitutions are
* being performed. This is a read-only parameter and is not modified.
* @param matcher The PatternMatcher used to find the current match.
* @param pattern The Pattern used to find the current match.
*/
public void appendSubstitution(StringBuffer appendBuffer, MatchResult match,
int substitutionCount,
PatternMatcherInput originalInput,
PatternMatcher matcher, Pattern pattern)
{
if(_subLength == 0)
return;
appendBuffer.append(_substitution);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/PatternMatcher.java 0000644 0001750 0001750 00000031131 07773723336 025462 0 ustar arnaud arnaud /*
* $Id: PatternMatcher.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @param offset The offset at which to start searching for the prefix. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(char[] input, Pattern pattern, int offset); /** * Determines if a prefix of a string matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The String to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(String input, Pattern pattern); /** * Determines if a prefix of a string (represented as a char[]) * matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(char[] input, Pattern pattern); /** * Determines if a prefix of a PatternMatcherInput instance * matches a given pattern. If there is a match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. You should remember that the region starting * from the begin offset of the PatternMatcherInput will be * tested for a prefix match. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The PatternMatcherInput to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(PatternMatcherInput input, Pattern pattern); /** * Determines if a string exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* @param input The String to test for an exact match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matches(String input, Pattern pattern); /** * Determines if a string (represented as a char[]) exactly matches * a given pattern. If there is an exact match, a MatchResult * instance representing the match is made accesible via * {@link #getMatch()}. *
* @param input The char[] to test for a match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matches(char[] input, Pattern pattern); /** * Determines if the contents of a PatternMatcherInput instance * exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. You should remember that the region between * the begin and end offsets of the PatternMatcherInput will be * tested for an exact match. *
* @param input The PatternMatcherInput to test for a match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matches(PatternMatcherInput input, Pattern pattern); /** * Determines if a string contains a pattern. If the pattern is * matched by some substring of the input, a MatchResult instance * representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* @param input The String to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. */ public boolean contains(String input, Pattern pattern); /** * Determines if a string (represented as a char[]) contains a pattern. * If the pattern is matched by some substring of the input, a MatchResult * instance representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* @param input The String to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. */ public boolean contains(char[] input, Pattern pattern); /** * Determines if the contents of a PatternMatcherInput, starting from the * current offset of the input contains a pattern. * If a pattern match is found, a MatchResult * instance representing the first such match is made acessible via * {@link #getMatch()}. The current offset of the * PatternMatcherInput is set to the offset corresponding to the end * of the match, so that a subsequent call to this method will continue * searching where the last call left off. You should remember that the * region between the begin and end offsets of the PatternMatcherInput are * considered the input to be searched, and that the current offset * of the PatternMatcherInput reflects where a search will start from. * Matches extending beyond the end offset of the PatternMatcherInput * will not be matched. In other words, a match must occur entirely * between the begin and end offsets of the input. See * {@link PatternMatcherInput} for more details. *
* This method is usually used in a loop as follows: *
** PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * PatternMatcherInput input; * MatchResult result; * * compiler = new Perl5Compiler(); * matcher = new Perl5Matcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.out.println("Bad pattern."); * System.out.println(e.getMessage()); * return; * } * * input = new PatternMatcherInput(someStringInput); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * } * *
* @param input The PatternMatcherInput to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. */ public boolean contains(PatternMatcherInput input, Pattern pattern); /** * Fetches the last match found by a call to a matches() or contains() * method. *
* @return A MatchResult instance containing the pattern match found
* by the last call to any one of the matches() or contains()
* methods. If no match was found by the last call,
* returns null.
*/
public MatchResult getMatch();
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/regex/Perl5Matcher.java 0000644 0001750 0001750 00000157250 07773723336 025047 0 ustar arnaud arnaud /*
* $Id: Perl5Matcher.java,v 1.27 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* Perl5Compiler and Perl5Matcher are designed with the intent that * you use a separate instance of each per thread to avoid the overhead * of both synchronization and concurrent access (e.g., a match that takes * a long time in one thread will block the progress of another thread with * a shorter match). If you want to use a single instance of each * in a concurrent program, you must appropriately protect access to * the instances with critical sections. If you want to share Perl5Pattern * instances between concurrently executing instances of Perl5Matcher, you * must compile the patterns with {@link Perl5Compiler#READ_ONLY_MASK}. * * @version @version@ * @since 1.0 * @see PatternMatcher * @see Perl5Compiler */ public final class Perl5Matcher implements PatternMatcher { private static final char __EOS = Character.MAX_VALUE; private static final int __INITIAL_NUM_OFFSETS = 20; private boolean __multiline = false, __lastSuccess = false; private boolean __caseInsensitive = false; private char __previousChar, __input[], __originalInput[]; private Perl5Repetition __currentRep; private int __numParentheses, __bol, __eol, __currentOffset, __endOffset; private char[] __program; private int __expSize, __inputOffset, __lastParen; private int[] __beginMatchOffsets, __endMatchOffsets; private Stack __stack = new Stack(); private Perl5MatchResult __lastMatchResult = null; private static boolean __compare(char[] s1, int s1Offs, char[] s2, int s2Offs, int n) { int cnt; for(cnt = 0; cnt < n; cnt++, s1Offs++, s2Offs++) { if(s1Offs >= s1.length) return false; if(s2Offs >= s2.length) return false; if(s1[s1Offs] != s2[s2Offs]) return false; } return true; } private static int __findFirst(char[] input, int current, int endOffset, char[] mustString) { int count, saveCurrent; char ch; if(input.length == 0) return endOffset; ch = mustString[0]; // Find the offset of the first character of the must string while(current < endOffset) { if(ch == input[current]){ saveCurrent = current; count = 0; while(current < endOffset && count < mustString.length) { if(mustString[count] != input[current]) break; ++count; ++current; } current = saveCurrent; if(count >= mustString.length) break; } ++current; } return current; } private void __pushState(int parenFloor) { int[] state; int stateEntries, paren; stateEntries = 3*(__expSize - parenFloor); if(stateEntries <= 0) state = new int[3]; else state = new int[stateEntries + 3]; state[0] = __expSize; state[1] = __lastParen; state[2] = __inputOffset; for(paren = __expSize; paren > parenFloor; --paren, stateEntries-=3) { state[stateEntries] = __endMatchOffsets[paren]; state[stateEntries + 1] = __beginMatchOffsets[paren]; state[stateEntries + 2] = paren; } __stack.push(state); } private void __popState() { int[] state; int entry, paren; state = (int[])__stack.pop(); __expSize = state[0]; __lastParen = state[1]; __inputOffset = state[2]; for(entry = 3; entry < state.length; entry+=3) { paren = state[entry + 2]; __beginMatchOffsets[paren] = state[entry + 1]; if(paren <= __lastParen) __endMatchOffsets[paren] = state[entry]; } for(paren = __lastParen + 1; paren <= __numParentheses; paren++) { if(paren > __expSize) __beginMatchOffsets[paren] = OpCode._NULL_OFFSET; __endMatchOffsets[paren] = OpCode._NULL_OFFSET; } } // Initialize globals needed before calling __tryExpression for first time private void __initInterpreterGlobals(Perl5Pattern expression, char[] input, int beginOffset, int endOffset, int currentOffset) { // Remove this hack after more efficient case-folding and unicode // character classes are implemented __caseInsensitive = expression._isCaseInsensitive; __input = input; __endOffset = endOffset; __currentRep = new Perl5Repetition(); __currentRep._numInstances = 0; __currentRep._lastRepetition = null; __program = expression._program; __stack.setSize(0); // currentOffset should always be >= beginOffset and should // always be equal to zero when beginOffset equals 0, but we // make a weak attempt to protect against a violation of this // precondition if(currentOffset == beginOffset || currentOffset <= 0) __previousChar = '\n'; else { __previousChar = input[currentOffset - 1]; if(!__multiline && __previousChar == '\n') __previousChar = '\0'; } __numParentheses = expression._numParentheses; __currentOffset = currentOffset; __bol = beginOffset; __eol = endOffset; // Ok, here we're using endOffset as a temporary variable. endOffset = __numParentheses + 1; if(__beginMatchOffsets == null || endOffset > __beginMatchOffsets.length) { if(endOffset < __INITIAL_NUM_OFFSETS) endOffset = __INITIAL_NUM_OFFSETS; __beginMatchOffsets = new int[endOffset]; __endMatchOffsets = new int[endOffset]; } } // Set the match result information. Only call this if we successfully // matched. private void __setLastMatchResult() { int offs, maxEndOffs = 0; //endOffset+=dontTry; __lastMatchResult = new Perl5MatchResult(__numParentheses + 1); // This can happen when using Perl5StreamInput if(__endMatchOffsets[0] > __originalInput.length) throw new ArrayIndexOutOfBoundsException(); __lastMatchResult._matchBeginOffset = __beginMatchOffsets[0]; while(__numParentheses >= 0) { offs = __beginMatchOffsets[__numParentheses]; if(offs >= 0) __lastMatchResult._beginGroupOffset[__numParentheses] = offs - __lastMatchResult._matchBeginOffset; else __lastMatchResult._beginGroupOffset[__numParentheses] = OpCode._NULL_OFFSET; offs = __endMatchOffsets[__numParentheses]; if(offs >= 0) { __lastMatchResult._endGroupOffset[__numParentheses] = offs - __lastMatchResult._matchBeginOffset; if(offs > maxEndOffs && offs <= __originalInput.length) maxEndOffs = offs; } else __lastMatchResult._endGroupOffset[__numParentheses] = OpCode._NULL_OFFSET; --__numParentheses; } __lastMatchResult._match = new String(__originalInput, __beginMatchOffsets[0], maxEndOffs - __beginMatchOffsets[0]); // Free up for garbage collection __originalInput = null; } // Expects to receive a valid regular expression program. No checking // is done to ensure validity. // __originalInput must be set before calling this method for // __lastMatchResult to be set correctly. // beginOffset marks the beginning of the string // currentOffset marks where to start the pattern search private boolean __interpret(Perl5Pattern expression, char[] input, int beginOffset, int endOffset, int currentOffset) { boolean success; int minLength = 0, dontTry = 0, offset; char ch, mustString[]; __initInterpreterGlobals(expression, input, beginOffset, endOffset, currentOffset); success = false; mustString = expression._mustString; _mainLoop: while(true) { if(mustString != null && ((expression._anchor & Perl5Pattern._OPT_ANCH) == 0 || ((__multiline || (expression._anchor & Perl5Pattern._OPT_ANCH_MBOL) != 0) && expression._back >= 0))) { __currentOffset = __findFirst(__input, __currentOffset, endOffset, mustString); if(__currentOffset >= endOffset) { if((expression._options & Perl5Compiler.READ_ONLY_MASK) == 0) expression._mustUtility++; success = false; break _mainLoop; } else if(expression._back >= 0) { __currentOffset-=expression._back; if(__currentOffset < currentOffset) __currentOffset = currentOffset; minLength = expression._back + mustString.length; } else if(!expression._isExpensive && (expression._options & Perl5Compiler.READ_ONLY_MASK) == 0 && (--expression._mustUtility < 0)) { // Be careful! The preceding logical expression is constructed // so that mustUtility is only decremented if the expression is // compiled without READ_ONLY_MASK. mustString = expression._mustString = null; __currentOffset = currentOffset; } else { __currentOffset = currentOffset; minLength = mustString.length; } } if((expression._anchor & Perl5Pattern._OPT_ANCH) != 0) { if(__currentOffset == beginOffset && __tryExpression(beginOffset)) { success = true; break _mainLoop; } else if(__multiline || (expression._anchor & Perl5Pattern._OPT_ANCH_MBOL) != 0 || (expression._anchor & Perl5Pattern._OPT_IMPLICIT) != 0) { if(minLength > 0) dontTry = minLength - 1; endOffset-=dontTry; if(__currentOffset > currentOffset) --__currentOffset; while(__currentOffset < endOffset) { if(__input[__currentOffset++] == '\n') { if(__currentOffset < endOffset && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } } } } break _mainLoop; } if(expression._startString != null) { mustString = expression._startString; if((expression._anchor & Perl5Pattern._OPT_SKIP) != 0) { ch = mustString[0]; while(__currentOffset < endOffset) { if(ch == __input[__currentOffset]) { if(__tryExpression(__currentOffset)){ success = true; break _mainLoop; } ++__currentOffset; while(__currentOffset < endOffset && __input[__currentOffset] == ch) ++__currentOffset; } ++__currentOffset; } } else { while((__currentOffset = __findFirst(__input, __currentOffset, endOffset, mustString)) < endOffset){ if(__tryExpression(__currentOffset)) { success = true; break _mainLoop; } ++__currentOffset; } } break _mainLoop; } if((offset = expression._startClassOffset) != OpCode._NULL_OFFSET) { boolean doEvery, tmp; char op; doEvery = ((expression._anchor & Perl5Pattern._OPT_SKIP) == 0); if(minLength > 0) dontTry = minLength - 1; endOffset -= dontTry; tmp = true; switch(op = __program[offset]) { case OpCode._ANYOF: offset = OpCode._getOperand(offset); while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(ch < 256 && (__program[offset + (ch >> 4)] & (1 << (ch & 0xf))) == 0) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._ANYOFUN: case OpCode._NANYOFUN: offset = OpCode._getOperand(offset); while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(__matchUnicodeClass(ch, __program, offset, op)) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._BOUND: if(minLength > 0) { ++dontTry; --endOffset; } if(__currentOffset != beginOffset) { ch = __input[__currentOffset - 1]; tmp = OpCode._isWordCharacter(ch); } else tmp = OpCode._isWordCharacter(__previousChar); while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(tmp != OpCode._isWordCharacter(ch)){ tmp = !tmp; if(__tryExpression(__currentOffset)) { success = true; break _mainLoop; } } ++__currentOffset; } if((minLength > 0 || tmp) && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } break; case OpCode._NBOUND: if(minLength > 0) { ++dontTry; --endOffset; } if(__currentOffset != beginOffset) { ch = __input[__currentOffset - 1]; tmp = OpCode._isWordCharacter(ch); } else tmp = OpCode._isWordCharacter(__previousChar); while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(tmp != OpCode._isWordCharacter(ch)) tmp = !tmp; else if(__tryExpression(__currentOffset)) { success = true; break _mainLoop; } ++__currentOffset; } if((minLength > 0 || !tmp) && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } break; case OpCode._ALNUM: while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(OpCode._isWordCharacter(ch)) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._NALNUM: while(__currentOffset < endOffset) { ch = __input[__currentOffset]; if(!OpCode._isWordCharacter(ch)) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._SPACE: while(__currentOffset < endOffset) { if(Character.isWhitespace(__input[__currentOffset])) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._NSPACE: while(__currentOffset < endOffset) { if(!Character.isWhitespace(__input[__currentOffset])) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._DIGIT: while(__currentOffset < endOffset) { if(Character.isDigit(__input[__currentOffset])) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; case OpCode._NDIGIT: while(__currentOffset < endOffset) { if(!Character.isDigit(__input[__currentOffset])) { if(tmp && __tryExpression(__currentOffset)) { success = true; break _mainLoop; } else tmp = doEvery; } else tmp = true; ++__currentOffset; } break; } // end switch } else { if(minLength > 0) dontTry = minLength - 1; endOffset-=dontTry; do { if(__tryExpression(__currentOffset)) { success = true; break _mainLoop; } } while(__currentOffset++ < endOffset); } break _mainLoop; } // end while __lastSuccess = success; __lastMatchResult = null; return success; } private boolean __matchUnicodeClass(char code, char __program[], int offset ,char opcode) { boolean isANYOF = ( opcode == OpCode._ANYOFUN ); while( __program[offset] != OpCode._END ){ if( __program[offset] == OpCode._RANGE ){ offset++; if((code >= __program[offset]) && (code <= __program[offset+1])){ return isANYOF; } else { offset+=2; } } else if(__program[offset] == OpCode._ONECHAR) { offset++; if(__program[offset++] == code) return isANYOF; } else { isANYOF = (__program[offset] == OpCode._OPCODE) ? isANYOF : !isANYOF; offset++; switch ( __program[offset++] ) { case OpCode._ALNUM: if(OpCode._isWordCharacter(code)) return isANYOF; break; case OpCode._NALNUM: if(!OpCode._isWordCharacter(code)) return isANYOF; break; case OpCode._SPACE: if(Character.isWhitespace(code)) return isANYOF; break; case OpCode._NSPACE: if(!Character.isWhitespace(code)) return isANYOF; break; case OpCode._DIGIT: if(Character.isDigit(code)) return isANYOF; break; case OpCode._NDIGIT: if(!Character.isDigit(code)) return isANYOF; break; case OpCode._ALNUMC: if(Character.isLetterOrDigit(code)) return isANYOF; break; case OpCode._ALPHA: if(Character.isLetter(code)) return isANYOF; break; case OpCode._BLANK: if(Character.isSpaceChar(code)) return isANYOF; break; case OpCode._CNTRL: if(Character.isISOControl(code)) return isANYOF; break; case OpCode._LOWER: if(Character.isLowerCase(code)) return isANYOF; // Remove this hack after more efficient case-folding and unicode // character classes are implemented if(__caseInsensitive && Character.isUpperCase(code)) return isANYOF; break; case OpCode._UPPER: if(Character.isUpperCase(code)) return isANYOF; // Remove this hack after more efficient case-folding and unicode // character classes are implemented if(__caseInsensitive && Character.isLowerCase(code)) return isANYOF; break; case OpCode._PRINT: if(Character.isSpaceChar(code)) return isANYOF; // Fall through to check if the character is alphanumeric, // or a punctuation mark. Printable characters are either // alphanumeric, punctuation marks, or spaces. case OpCode._GRAPH: if(Character.isLetterOrDigit(code)) return isANYOF; // Fall through to check if the character is a punctuation mark. // Graph characters are either alphanumeric or punctuation. case OpCode._PUNCT: switch ( Character.getType(code) ) { case Character.DASH_PUNCTUATION: case Character.START_PUNCTUATION: case Character.END_PUNCTUATION: case Character.CONNECTOR_PUNCTUATION: case Character.OTHER_PUNCTUATION: case Character.MATH_SYMBOL: case Character.CURRENCY_SYMBOL: case Character.MODIFIER_SYMBOL: return isANYOF; default: break; } break; case OpCode._XDIGIT: if( (code >= '0' && code <= '9') || (code >= 'a' && code <= 'f') || (code >= 'A' && code <= 'F')) return isANYOF; break; case OpCode._ASCII: if(code < 0x80)return isANYOF; } } } return !isANYOF; } private boolean __tryExpression(int offset) { int count; __inputOffset = offset; __lastParen = 0; __expSize = 0; if(__numParentheses > 0) { for(count=0; count <= __numParentheses; count++) { __beginMatchOffsets[count] = OpCode._NULL_OFFSET; __endMatchOffsets[count] = OpCode._NULL_OFFSET; } } if(__match(1)){ __beginMatchOffsets[0] = offset; __endMatchOffsets[0] = __inputOffset; return true; } return false; } private int __repeat(int offset, int max) { int scan, eol, operand, ret; char ch; char op; scan = __inputOffset; eol = __eol; if(max != Character.MAX_VALUE && max < eol - scan) eol = scan + max; operand = OpCode._getOperand(offset); switch(op = __program[offset]) { case OpCode._ANY: while(scan < eol && __input[scan] != '\n') ++scan; break; case OpCode._SANY: scan = eol; break; case OpCode._EXACTLY: ++operand; while(scan < eol && __program[operand] == __input[scan]) ++scan; break; case OpCode._ANYOF: if(scan < eol && (ch = __input[scan]) < 256) { while((ch < 256 ) && (__program[operand + (ch >> 4)] & (1 << (ch & 0xf))) == 0) { if(++scan < eol) ch = __input[scan]; else break; } } break; case OpCode._ANYOFUN: case OpCode._NANYOFUN: if(scan < eol) { ch = __input[scan]; while(__matchUnicodeClass(ch, __program, operand, op)){ if(++scan < eol) ch = __input[scan]; else break; } } break; case OpCode._ALNUM: while(scan < eol && OpCode._isWordCharacter(__input[scan])) ++scan; break; case OpCode._NALNUM: while(scan < eol && !OpCode._isWordCharacter(__input[scan])) ++scan; break; case OpCode._SPACE: while(scan < eol && Character.isWhitespace(__input[scan])) ++scan; break; case OpCode._NSPACE: while(scan < eol && !Character.isWhitespace(__input[scan])) ++scan; break; case OpCode._DIGIT: while(scan < eol && Character.isDigit(__input[scan])) ++scan; break; case OpCode._NDIGIT: while(scan < eol && !Character.isDigit(__input[scan])) ++scan; break; default: break; } ret = scan - __inputOffset; __inputOffset = scan; return ret; } private boolean __match(int offset) { char nextChar, op; int scan, next, input, maxScan, current, line, arg; boolean inputRemains = true, minMod = false; Perl5Repetition rep; input = __inputOffset; inputRemains = (input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); scan = offset; maxScan = __program.length; while(scan < maxScan /*&& scan > 0*/){ next = OpCode._getNext(__program, scan); switch(op = __program[scan]) { case OpCode._BOL: if(input == __bol ? __previousChar == '\n' : (__multiline && (inputRemains || input < __eol) && __input[input - 1] == '\n')) break; return false; case OpCode._MBOL: if(input == __bol ? __previousChar == '\n' : ((inputRemains || input < __eol) && __input[input - 1] == '\n')) break; return false; case OpCode._SBOL: if(input == __bol && __previousChar == '\n') break; return false; case OpCode._GBOL: if(input == __bol) break; return true; case OpCode._EOL : if((inputRemains || input < __eol) && nextChar != '\n') return false; if(!__multiline && __eol - input > 1) return false; break; case OpCode._MEOL: if((inputRemains || input < __eol) && nextChar != '\n') return false; break; case OpCode._SEOL: if((inputRemains || input < __eol) && nextChar != '\n') return false; if(__eol - input > 1) return false; break; case OpCode._SANY: if(!inputRemains && input >= __eol) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._ANY: if((!inputRemains && input >= __eol) || nextChar == '\n') return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._EXACTLY: current = OpCode._getOperand(scan); line = __program[current++]; if(__program[current] != nextChar) return false; if(__eol - input < line) return false; if(line > 1 && !__compare(__program, current, __input, input, line)) return false; input+=line; inputRemains = (input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._ANYOF: current = OpCode._getOperand(scan); if(nextChar == __EOS && inputRemains) nextChar = __input[input]; if(nextChar >= 256 || (__program[current + (nextChar >> 4)] & (1 << (nextChar & 0xf))) != 0) return false; if(!inputRemains && input >= __eol) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._ANYOFUN: case OpCode._NANYOFUN: current = OpCode._getOperand(scan); if(nextChar == __EOS && inputRemains) nextChar = __input[input]; if(!__matchUnicodeClass(nextChar, __program, current, op)) return false; if(!inputRemains && input >= __eol) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._ALNUM: if(!inputRemains) return false; if(!OpCode._isWordCharacter(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._NALNUM: if(!inputRemains && input >= __eol) return false; if(OpCode._isWordCharacter(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._NBOUND: case OpCode._BOUND: boolean a, b; if(input == __bol) a = OpCode._isWordCharacter(__previousChar); else a = OpCode._isWordCharacter(__input[input - 1]); b = OpCode._isWordCharacter(nextChar); if((a == b) == (__program[scan] == OpCode._BOUND)) return false; break; case OpCode._SPACE: if(!inputRemains && input >= __eol) return false; if(!Character.isWhitespace(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._NSPACE: if(!inputRemains) return false; if(Character.isWhitespace(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._DIGIT: if(!Character.isDigit(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._NDIGIT: if(!inputRemains && input >= __eol) return false; if(Character.isDigit(nextChar)) return false; inputRemains = (++input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._REF: arg = OpCode._getArg1(__program, scan); current = __beginMatchOffsets[arg]; if(current == OpCode._NULL_OFFSET) return false; if(__endMatchOffsets[arg] == OpCode._NULL_OFFSET) return false; if(current == __endMatchOffsets[arg]) break; if(__input[current] != nextChar) return false; line = __endMatchOffsets[arg] - current; if(input + line > __eol) return false; if(line > 1 && !__compare(__input, current, __input, input, line)) return false; input+=line; inputRemains = (input < __endOffset); nextChar = (inputRemains ? __input[input] : __EOS); break; case OpCode._NOTHING: break; case OpCode._BACK: break; case OpCode._OPEN: arg = OpCode._getArg1(__program, scan); __beginMatchOffsets[arg] = input; if(arg > __expSize) __expSize = arg; break; case OpCode._CLOSE: arg = OpCode._getArg1(__program, scan); __endMatchOffsets[arg] = input; if(arg > __lastParen) __lastParen = arg; break; case OpCode._CURLYX: rep = new Perl5Repetition(); rep._lastRepetition = __currentRep; __currentRep = rep; rep._parenFloor = __lastParen; rep._numInstances = -1; rep._min = OpCode._getArg1(__program, scan); rep._max = OpCode._getArg2(__program, scan); rep._scan = OpCode._getNextOperator(scan) + 2; rep._next = next; rep._minMod = minMod; // Must initialize to -1 because if we initialize to 0 and are // at the beginning of the input the OpCode._WHILEM case will // not work right. rep._lastLocation = -1; __inputOffset = input; // use minMod as temporary minMod = __match(OpCode._getPrevOperator(next)); // leave scope call not pertinent? __currentRep = rep._lastRepetition; return minMod; case OpCode._WHILEM: rep = __currentRep; arg = rep._numInstances + 1; __inputOffset = input; if(input == rep._lastLocation) { __currentRep = rep._lastRepetition; line = __currentRep._numInstances; if(__match(rep._next)) return true; __currentRep._numInstances = line; __currentRep = rep; return false; } if(arg < rep._min) { rep._numInstances = arg; rep._lastLocation = input; if(__match(rep._scan)) return true; rep._numInstances = arg - 1; return false; } if(rep._minMod) { __currentRep = rep._lastRepetition; line = __currentRep._numInstances; if(__match(rep._next)) return true; __currentRep._numInstances = line; __currentRep = rep; if(arg >= rep._max) return false; __inputOffset = input; rep._numInstances = arg; rep._lastLocation = input; if(__match(rep._scan)) return true; rep._numInstances = arg - 1; return false; } if(arg < rep._max) { __pushState(rep._parenFloor); rep._numInstances = arg; rep._lastLocation = input; if(__match(rep._scan)) return true; __popState(); __inputOffset = input; } __currentRep = rep._lastRepetition; line = __currentRep._numInstances; if(__match(rep._next)) return true; rep._numInstances = line; __currentRep = rep; rep._numInstances = arg - 1; return false; case OpCode._BRANCH: if(__program[next] != OpCode._BRANCH) next = OpCode._getNextOperator(scan); else { int lastParen; lastParen = __lastParen; do { __inputOffset = input; if(__match(OpCode._getNextOperator(scan))) return true; for(arg = __lastParen; arg > lastParen; --arg) //__endMatchOffsets[arg] = 0; __endMatchOffsets[arg] = OpCode._NULL_OFFSET; __lastParen = arg; scan = OpCode._getNext(__program, scan); } while(scan != OpCode._NULL_OFFSET && __program[scan] == OpCode._BRANCH); return false; } break; case OpCode._MINMOD: minMod = true; break; case OpCode._CURLY: case OpCode._STAR: case OpCode._PLUS: if(op == OpCode._CURLY) { line = OpCode._getArg1(__program, scan); arg = OpCode._getArg2(__program, scan); scan = OpCode._getNextOperator(scan) + 2; } else if(op == OpCode._STAR) { line = 0; arg = Character.MAX_VALUE; scan = OpCode._getNextOperator(scan); } else { line = 1; arg = Character.MAX_VALUE; scan = OpCode._getNextOperator(scan); } if(__program[next] == OpCode._EXACTLY) { nextChar = __program[OpCode._getOperand(next) + 1]; current = 0; } else { nextChar = __EOS; current = -1000; } __inputOffset = input; if(minMod) { minMod = false; if(line > 0 && __repeat(scan, line) < line) return false; while(arg >= line || (arg == Character.MAX_VALUE && line > 0)) { // there may be a bug here with respect to // __inputOffset >= __endOffset, but it seems to be right for // now. the issue is with __inputOffset being reset later. // is this test really supposed to happen here? if(current == -1000 || __inputOffset >= __endOffset || __input[__inputOffset] == nextChar) { if(__match(next)) return true; } __inputOffset = input + line; if(__repeat(scan, 1) != 0) { ++line; __inputOffset = input + line; } else return false; } } else { arg = __repeat(scan, arg); if(line < arg && OpCode._opType[__program[next]] == OpCode._EOL && ((!__multiline && __program[next] != OpCode._MEOL) || __program[next] == OpCode._SEOL)) line = arg; while(arg >= line) { // there may be a bug here with respect to // __inputOffset >= __endOffset, but it seems to be right for // now. the issue is with __inputOffset being reset later. // is this test really supposed to happen here? if(current == -1000 || __inputOffset >= __endOffset || __input[__inputOffset] == nextChar) { if(__match(next)) return true; } --arg; __inputOffset = input + arg; } } return false; case OpCode._SUCCEED: case OpCode._END: __inputOffset = input; // This enforces the rule that two consecutive matches cannot have // the same end offset. if(__inputOffset == __lastMatchInputEndOffset) return false; return true; case OpCode._IFMATCH: __inputOffset = input; scan = OpCode._getNextOperator(scan); if(!__match(scan)) return false; break; case OpCode._UNLESSM: __inputOffset = input; scan = OpCode._getNextOperator(scan); if(__match(scan)) return false; break; default: // todo: Need to throw an exception here. } // end switch //scan = (next > 0 ? next : 0); scan = next; } // end while scan return false; } /** * Set whether or not subsequent calls to {@link #matches matches()} * or {@link #contains contains()} should treat the input as * consisting of multiple lines. The default behavior is for * input to be treated as consisting of multiple lines. This method * should only be called if the Perl5Pattern used for a match was * compiled without either of the Perl5Compiler.MULTILINE_MASK or * Perl5Compiler.SINGLELINE_MASK flags, and you want to alter the * behavior of how the ^, $, and . metacharacters are * interpreted on the fly. The compilation options used when compiling * a pattern ALWAYS override the behavior specified by setMultiline(). See * {@link Perl5Compiler} for more details. *
* @param multiline If set to true treats the input as consisting of * multiple lines with respect to the ^ and $ * metacharacters. If set to false treats the input as consisting * of a single line with respect to the ^ and $ * metacharacters. */ public void setMultiline(boolean multiline) { __multiline = multiline; } /** * @return True if the matcher is treating input as consisting of multiple * lines with respect to the ^ and $ metacharacters, * false otherwise. */ public boolean isMultiline() { return __multiline; } char[] _toLower(char[] input) { int current; char[] inp; // todo: // Certainly not the best way to do case insensitive matching. // Must definitely change this in some way, but for now we // do what Perl does and make a copy of the input, converting // it all to lowercase. This is truly better handled in the // compilation phase. inp = new char[input.length]; System.arraycopy(input, 0, inp, 0, input.length); input = inp; // todo: Need to inline toLowerCase() for(current = 0; current < input.length; current++) if(Character.isUpperCase(input[current])) input[current] = Character.toLowerCase(input[current]); return input; } /** * Determines if a prefix of a string (represented as a char[]) * matches a given pattern, starting from a given offset into the string. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @param offset The offset at which to start searching for the prefix. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(char[] input, Pattern pattern, int offset) { Perl5Pattern expression; expression = (Perl5Pattern)pattern; __originalInput = input; if(expression._isCaseInsensitive) input = _toLower(input); __initInterpreterGlobals(expression, input, 0, input.length, offset); __lastSuccess = __tryExpression(offset); __lastMatchResult = null; return __lastSuccess; } /** * Determines if a prefix of a string (represented as a char[]) * matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(char[] input, Pattern pattern) { return matchesPrefix(input, pattern, 0); } /** * Determines if a prefix of a string matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The String to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(String input, Pattern pattern) { return matchesPrefix(input.toCharArray(), pattern, 0); } /** * Determines if a prefix of a PatternMatcherInput instance * matches a given pattern. If there is a match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. However, unlike the * {@link #matches matches(PatternMatcherInput, Pattern)} method, * matchesPrefix() will start its search from the current offset * rather than the begin offset of the PatternMatcherInput. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The PatternMatcherInput to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(PatternMatcherInput input, Pattern pattern) { char[] inp; Perl5Pattern expression; expression = (Perl5Pattern)pattern; __originalInput = input._originalBuffer; if(expression._isCaseInsensitive) { if(input._toLowerBuffer == null) input._toLowerBuffer = _toLower(__originalInput); inp = input._toLowerBuffer; } else inp = __originalInput; __initInterpreterGlobals(expression, inp, input._beginOffset, input._endOffset, input._currentOffset); __lastSuccess = __tryExpression(input._currentOffset); __lastMatchResult = null; return __lastSuccess; } /** * Determines if a string (represented as a char[]) exactly * matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. The pattern must be * a Perl5Pattern instance, otherwise a ClassCastException will * be thrown. You are not required to, and indeed should NOT try to * (for performance reasons), catch a ClassCastException because it * will never be thrown as long as you use a Perl5Pattern as the pattern * parameter. *
* Note: matches() is not the same as sticking a ^ in front of * your expression and a $ at the end of your expression in Perl5 * and using the =~ operator, even though in many cases it will be * equivalent. matches() literally looks for an exact match according * to the rules of Perl5 expression matching. Therefore, if you have * a pattern foo|foot and are matching the input foot * it will not produce an exact match. But foot|foo will * produce an exact match for either foot or foo. * Remember, Perl5 regular expressions do not match the longest * possible match. From the perlre manpage: *
* Alternatives are tried from left to right, so the first * alternative found for which the entire expression matches, * is the one that is chosen. This means that alternatives * are not necessarily greedy. For example: when matching * foo|foot against "barefoot", only the "foo" part will * match, as that is the first alternative tried, and it * successfully matches the target string. **
* @param input The char[] to test for an exact match. * @param pattern The Perl5Pattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean matches(char[] input, Pattern pattern) { Perl5Pattern expression; expression = (Perl5Pattern)pattern; __originalInput = input; if(expression._isCaseInsensitive) input = _toLower(input); __initInterpreterGlobals(expression, input, 0, input.length, 0); __lastSuccess = (__tryExpression(0) && __endMatchOffsets[0] == input.length); __lastMatchResult = null; return __lastSuccess; } /** * Determines if a string exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. The pattern must be * a Perl5Pattern instance, otherwise a ClassCastException will * be thrown. You are not required to, and indeed should NOT try to * (for performance reasons), catch a ClassCastException because it * will never be thrown as long as you use a Perl5Pattern as the pattern * parameter. *
* Note: matches() is not the same as sticking a ^ in front of * your expression and a $ at the end of your expression in Perl5 * and using the =~ operator, even though in many cases it will be * equivalent. matches() literally looks for an exact match according * to the rules of Perl5 expression matching. Therefore, if you have * a pattern foo|foot and are matching the input foot * it will not produce an exact match. But foot|foo will * produce an exact match for either foot or foo. * Remember, Perl5 regular expressions do not match the longest * possible match. From the perlre manpage: *
* Alternatives are tried from left to right, so the first * alternative found for which the entire expression matches, * is the one that is chosen. This means that alternatives * are not necessarily greedy. For example: when matching * foo|foot against "barefoot", only the "foo" part will * match, as that is the first alternative tried, and it * successfully matches the target string. **
* @param input The String to test for an exact match. * @param pattern The Perl5Pattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean matches(String input, Pattern pattern) { return matches(input.toCharArray(), pattern); } /** * Determines if the contents of a PatternMatcherInput instance * exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. You should remember that the region between * the begin (NOT the current) and end offsets of the PatternMatcherInput * will be tested for an exact match. *
* The pattern must be a Perl5Pattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * a Perl5Pattern as the pattern parameter. *
* Note: matches() is not the same as sticking a ^ in front of * your expression and a $ at the end of your expression in Perl5 * and using the =~ operator, even though in many cases it will be * equivalent. matches() literally looks for an exact match according * to the rules of Perl5 expression matching. Therefore, if you have * a pattern foo|foot and are matching the input foot * it will not produce an exact match. But foot|foo will * produce an exact match for either foot or foo. * Remember, Perl5 regular expressions do not match the longest * possible match. From the perlre manpage: *
* Alternatives are tried from left to right, so the first * alternative found for which the entire expression matches, * is the one that is chosen. This means that alternatives * are not necessarily greedy. For example: when matching * foo|foot against "barefoot", only the "foo" part will * match, as that is the first alternative tried, and it * successfully matches the target string. **
* @param input The PatternMatcherInput to test for a match. * @param pattern The Perl5Pattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean matches(PatternMatcherInput input, Pattern pattern) { char[] inp; Perl5Pattern expression; expression = (Perl5Pattern)pattern; __originalInput = input._originalBuffer; if(expression._isCaseInsensitive) { if(input._toLowerBuffer == null) input._toLowerBuffer = _toLower(__originalInput); inp = input._toLowerBuffer; } else inp = __originalInput; __initInterpreterGlobals(expression, inp, input._beginOffset, input._endOffset, input._beginOffset); __lastMatchResult = null; if(__tryExpression(input._beginOffset)) { if(__endMatchOffsets[0] == input._endOffset || input.length() == 0 || input._beginOffset == input._endOffset) { __lastSuccess = true; return true; } } __lastSuccess = false; return false; } /** * Determines if a string contains a pattern. If the pattern is * matched by some substring of the input, a MatchResult instance * representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* The pattern must be a Perl5Pattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * a Perl5Pattern as the pattern parameter. *
* @param input The String to test for a match. * @param pattern The Perl5Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean contains(String input, Pattern pattern) { return contains(input.toCharArray(), pattern); } /** * Determines if a string (represented as a char[]) contains a pattern. * If the pattern is * matched by some substring of the input, a MatchResult instance * representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* The pattern must be a Perl5Pattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * a Perl5Pattern as the pattern parameter. *
* @param input The char[] to test for a match. * @param pattern The Perl5Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean contains(char[] input, Pattern pattern) { Perl5Pattern expression; expression = (Perl5Pattern)pattern; __originalInput = input; if(expression._isCaseInsensitive) input = _toLower(input); return __interpret(expression, input, 0, input.length, 0); } private static final int __DEFAULT_LAST_MATCH_END_OFFSET = -100; private int __lastMatchInputEndOffset = __DEFAULT_LAST_MATCH_END_OFFSET; /** * Determines if the contents of a PatternMatcherInput, starting from the * current offset of the input contains a pattern. * If a pattern match is found, a MatchResult * instance representing the first such match is made acessible via * {@link #getMatch()}. The current offset of the * PatternMatcherInput is set to the offset corresponding to the end * of the match, so that a subsequent call to this method will continue * searching where the last call left off. You should remember that the * region between the begin and end offsets of the PatternMatcherInput are * considered the input to be searched, and that the current offset * of the PatternMatcherInput reflects where a search will start from. * Matches extending beyond the end offset of the PatternMatcherInput * will not be matched. In other words, a match must occur entirely * between the begin and end offsets of the input. See * {@link PatternMatcherInput} for more details. *
* As a side effect, if a match is found, the PatternMatcherInput match * offset information is updated. See the * {@link PatternMatcherInput#setMatchOffsets(int, int)} * method for more details. *
* The pattern must be a Perl5Pattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * a Perl5Pattern as the pattern parameter. *
* This method is usually used in a loop as follows: *
** PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * PatternMatcherInput input; * MatchResult result; * * compiler = new Perl5Compiler(); * matcher = new Perl5Matcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.err.println("Bad pattern."); * System.err.println(e.getMessage()); * return; * } * * input = new PatternMatcherInput(someStringInput); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * } * *
* @param input The PatternMatcherInput to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than a * Perl5Pattern is passed as the pattern parameter. */ public boolean contains(PatternMatcherInput input, Pattern pattern) { char[] inp; Perl5Pattern expression; boolean matchFound; //if(input.length() > 0) { // We want to allow a null string to match at the end of the input // which is why we don't check endOfInput. Not sure if this is a // safe thing to do or not. if(input._currentOffset > input._endOffset) return false; //} /* else if(input._endOfInput()) return false; */ expression = (Perl5Pattern)pattern; __originalInput = input._originalBuffer; // Todo: // Really should only reduce to lowercase that part of the // input that is necessary, instead of the whole thing. // Adjust MatchResult offsets accordingly. Actually, pass an adjustment // value to __interpret. __originalInput = input._originalBuffer; if(expression._isCaseInsensitive) { if(input._toLowerBuffer == null) input._toLowerBuffer = _toLower(__originalInput); inp = input._toLowerBuffer; } else inp = __originalInput; __lastMatchInputEndOffset = input.getMatchEndOffset(); matchFound = __interpret(expression, inp, input._beginOffset, input._endOffset, input._currentOffset); if(matchFound) { input.setCurrentOffset(__endMatchOffsets[0]); input.setMatchOffsets(__beginMatchOffsets[0], __endMatchOffsets[0]); } else { input.setCurrentOffset(input._endOffset + 1); } // Restore so it doesn't interfere with other unrelated matches. __lastMatchInputEndOffset = __DEFAULT_LAST_MATCH_END_OFFSET; return matchFound; } /** * Fetches the last match found by a call to a matches() or contains() * method. If you plan on modifying the original search input, you * must call this method BEFORE you modify the original search input, * as a lazy evaluation technique is used to create the MatchResult. * This reduces the cost of pattern matching when you don't care about * the actual match and only care if the pattern occurs in the input. * Otherwise, a MatchResult would be created for every match found, * whether or not the MatchResult was later used by a call to getMatch(). *
* @return A MatchResult instance containing the pattern match found
* by the last call to any one of the matches() or contains()
* methods. If no match was found by the last call, returns
* null.
*/
public MatchResult getMatch() {
if(!__lastSuccess)
return null;
if(__lastMatchResult == null)
__setLastMatchResult();
return __lastMatchResult;
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/ 0000755 0001750 0001750 00000000000 10423237774 021337 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/CharacterClassNode.java 0000644 0001750 0001750 00000006630 07773723336 025707 0 ustar arnaud arnaud /*
* $Id: CharacterClassNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* If you want to perform line by line * matches on an input stream, you should use a DataInput or BufferedReader * instance in conjunction * with one of the PatternMatcher methods taking a String, char[], or * PatternMatcherInput as an argument. The DataInput and BufferedReader * readLine() methods will likely be implemented as native methods and * therefore more efficient than supporting line by line searching within * AwkStreamInput. *
* In the future the programmer will be able to set this class to save * all the input it sees so that it can be accessed later. This will avoid * having to read a stream more than once for whatever reason. * * @version @version@ * @since 1.0 * @see AwkMatcher */ public final class AwkStreamInput { static final int _DEFAULT_BUFFER_INCREMENT = 2048; private Reader __searchStream; private int __bufferIncrementUnit; boolean _endOfStreamReached; // The offset into the stream corresponding to buffer[0] int _bufferSize, _bufferOffset, _currentOffset; char[] _buffer; /** * We use this default contructor only within the package to create a dummy * AwkStreamInput instance. */ AwkStreamInput() { _currentOffset = 0; } /** * Creates an AwkStreamInput instance bound to a Reader with a * specified initial buffer size and default buffer increment. *
* @param input The InputStream to associate with the AwkStreamInput * instance. * @param bufferIncrement The initial buffer size and the default buffer * increment to use when the input buffer has to be increased in * size. */ public AwkStreamInput(Reader input, int bufferIncrement) { __searchStream = input; __bufferIncrementUnit = bufferIncrement; _buffer = new char[bufferIncrement]; _bufferOffset = _bufferSize = _currentOffset = 0; _endOfStreamReached = false; } /** * Creates an AwkStreamInput instance bound to a Reader with an * initial buffer size and default buffer increment of 2048 bytes. *
* @param input The InputStream to associate with the AwkStreamInput
* instance.
*/
public AwkStreamInput(Reader input) {
this(input, _DEFAULT_BUFFER_INCREMENT);
}
// Only called when buffer overflows
int _reallocate(int initialOffset) throws IOException {
int offset, bytesRead;
char[] tmpBuffer;
if(_endOfStreamReached)
return _bufferSize;
offset = _bufferSize - initialOffset;
tmpBuffer = new char[offset + __bufferIncrementUnit];
bytesRead =
__searchStream.read(tmpBuffer, offset, __bufferIncrementUnit);
if(bytesRead <= 0){
_endOfStreamReached = true;
/* bytesRead should never equal zero, but if it does, we don't
want to continue to try and read, running the risk of entering
an infinite loop. Throw an IOException instead, because this
really IS an exception. */
if(bytesRead == 0)
throw new IOException("read from input stream returned 0 bytes.");
return _bufferSize;
} else {
_bufferOffset += initialOffset;
_bufferSize = offset + bytesRead;
System.arraycopy(_buffer, initialOffset, tmpBuffer, 0, offset);
_buffer = tmpBuffer;
}
return offset;
}
boolean read() throws IOException {
_bufferOffset+=_bufferSize;
_bufferSize = __searchStream.read(_buffer);
_endOfStreamReached = (_bufferSize == -1);
return (!_endOfStreamReached);
}
public boolean endOfStream() { return _endOfStreamReached; }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/StarNode.java 0000644 0001750 0001750 00000006675 07773723336 023747 0 ustar arnaud arnaud /*
* $Id: StarNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* It is important for you to remember that AwkMatcher does not save * parenthesized sub-group information. Therefore the number of groups * saved in a MatchResult produced by AwkMatcher will always be 1. * * @version @version@ * @since 1.0 * @see org.apache.oro.text.regex.PatternMatcher * @see AwkCompiler */ public final class AwkMatcher implements PatternMatcher { private int __lastMatchedBufferOffset; private AwkMatchResult __lastMatchResult = null; private AwkStreamInput __scratchBuffer, __streamSearchBuffer; private AwkPattern __awkPattern; private int __offsets[] = new int[2]; /** * A kluge variable to make PatternMatcherInput matches work when * their begin offset is non-zero. This kluge is caused by the * misguided notion that AwkStreamInput could be overloaded to do * both stream and fixed buffer matches. The whole input representation * scheme has to be scrapped and redone. -- dfs 2001/07/10 */ private int __beginOffset; public AwkMatcher() { __scratchBuffer = new AwkStreamInput(); __scratchBuffer._endOfStreamReached = true; } /** * Determines if a prefix of a string (represented as a char[]) * matches a given pattern, starting from a given offset into the string. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @param offset The offset at which to start searching for the prefix. * @return True if input matches pattern, false otherwise. */ // I reimplemented this method in terms of streammatchesPrefix // to reduce the code size. This is not very elegant and // reduces performance by a small degree. public boolean matchesPrefix(char[] input, Pattern pattern, int offset){ int result = -1; __awkPattern = (AwkPattern)pattern; __scratchBuffer._buffer = input; __scratchBuffer._bufferSize = input.length; __scratchBuffer._bufferOffset = __beginOffset = 0; __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; __offsets[0] = offset; try { result = __streamMatchPrefix(); } catch(IOException e){ // Don't do anything because we're not doing any I/O result = -1; } if(result < 0) { __lastMatchResult = null; return false; } __lastMatchResult = new AwkMatchResult(new String(input, 0, result), offset); return true; } /** * Determines if a prefix of a string (represented as a char[]) * matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The char[] to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(char[] input, Pattern pattern){ return matchesPrefix(input, pattern, 0); } /** * Determines if a prefix of a string matches a given pattern. * If a prefix of the string matches the pattern, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The String to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(String input, Pattern pattern) { return matchesPrefix(input.toCharArray(), pattern, 0); } /** * Determines if a prefix of a PatternMatcherInput instance * matches a given pattern. If there is a match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. You should remember that the region starting * from the begin offset of the PatternMatcherInput will be * tested for a prefix match. *
* This method is useful for certain common token identification tasks * that are made more difficult without this functionality. *
* @param input The PatternMatcherInput to test for a prefix match. * @param pattern The Pattern to be matched. * @return True if input matches pattern, false otherwise. */ public boolean matchesPrefix(PatternMatcherInput input, Pattern pattern){ int result = -1; __awkPattern = (AwkPattern)pattern; __scratchBuffer._buffer = input.getBuffer(); __scratchBuffer._bufferOffset = __beginOffset = input.getBeginOffset(); __offsets[0] = input.getCurrentOffset(); __scratchBuffer._bufferSize = input.length(); __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; try { result = __streamMatchPrefix(); } catch(IOException e) { // Don't do anything because we're not doing any I/O result = -1; } if(result < 0) { __lastMatchResult = null; return false; } __lastMatchResult = new AwkMatchResult(new String(__scratchBuffer._buffer, __offsets[0], result), __offsets[0]); return true; } /** * Determines if a string (represented as a char[]) exactly * matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. The pattern must be * an AwkPattern instance, otherwise a ClassCastException will * be thrown. You are not required to, and indeed should NOT try to * (for performance reasons), catch a ClassCastException because it * will never be thrown as long as you use an AwkPattern as the pattern * parameter. *
* @param input The char[] to test for an exact match. * @param pattern The AwkPattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean matches(char[] input, Pattern pattern) { int result = -1; __awkPattern = (AwkPattern)pattern; __scratchBuffer._buffer = input; __scratchBuffer._bufferSize = input.length; __scratchBuffer._bufferOffset = __beginOffset = 0; __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; __offsets[0] = 0; try { result = __streamMatchPrefix(); } catch(IOException e){ // Don't do anything because we're not doing any I/O result = -1; } if(result != input.length) { __lastMatchResult = null; return false; } __lastMatchResult = new AwkMatchResult(new String(input, 0, result), 0); return true; } /** * Determines if a string exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. The pattern must be * a AwkPattern instance, otherwise a ClassCastException will * be thrown. You are not required to, and indeed should NOT try to * (for performance reasons), catch a ClassCastException because it * will never be thrown as long as you use an AwkPattern as the pattern * parameter. *
* @param input The String to test for an exact match. * @param pattern The AwkPattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean matches(String input, Pattern pattern){ return matches(input.toCharArray(), pattern); } /** * Determines if the contents of a PatternMatcherInput instance * exactly matches a given pattern. If * there is an exact match, a MatchResult instance * representing the match is made accesible via * {@link #getMatch()}. Unlike the * {@link #contains(PatternMatcherInput, Pattern)} * method, the current offset of the PatternMatcherInput argument * is not updated. You should remember that the region between * the begin and end offsets of the PatternMatcherInput will be * tested for an exact match. *
* The pattern must be an AwkPattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * an AwkPattern as the pattern parameter. *
* @param input The PatternMatcherInput to test for a match. * @param pattern The AwkPattern to be matched. * @return True if input matches pattern, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean matches(PatternMatcherInput input, Pattern pattern){ int result = -1; __awkPattern = (AwkPattern)pattern; __scratchBuffer._buffer = input.getBuffer(); __scratchBuffer._bufferSize = input.length(); __scratchBuffer._bufferOffset = __beginOffset = input.getBeginOffset(); __offsets[0] = input.getBeginOffset(); __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; try { result = __streamMatchPrefix(); } catch(IOException e){ // Don't do anything because we're not doing any I/O result = -1; } if(result != __scratchBuffer._bufferSize) { __lastMatchResult = null; return false; } __lastMatchResult = new AwkMatchResult(new String(__scratchBuffer._buffer, __offsets[0], __scratchBuffer._bufferSize), __offsets[0]); return true; } /** * Determines if a string (represented as a char[]) contains a pattern. * If the pattern is * matched by some substring of the input, a MatchResult instance * representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* The pattern must be an AwkPattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * an AwkPattern as the pattern parameter. *
* @param input The char[] to test for a match. * @param pattern The AwkPattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean contains(char[] input, Pattern pattern) { __awkPattern = (AwkPattern)pattern; // Begin anchor requires match occur at beginning of input if(__awkPattern._hasBeginAnchor && !__awkPattern._fastMap[input[0]]){ __lastMatchResult = null; return false; } __scratchBuffer._buffer = input; __scratchBuffer._bufferSize = input.length; __scratchBuffer._bufferOffset = __beginOffset = 0; __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; __lastMatchedBufferOffset = 0; try { _search(); } catch(IOException e) { // do nothing } return (__lastMatchResult != null); } /** * Determines if a string contains a pattern. If the pattern is * matched by some substring of the input, a MatchResult instance * representing the first such match is made acessible via * {@link #getMatch()}. If you want to access * subsequent matches you should either use a PatternMatcherInput object * or use the offset information in the MatchResult to create a substring * representing the remaining input. Using the MatchResult offset * information is the recommended method of obtaining the parts of the * string preceeding the match and following the match. *
* The pattern must be an AwkPattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * an AwkPattern as the pattern parameter. *
* @param input The String to test for a match. * @param pattern The AwkPattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean contains(String input, Pattern pattern){ return contains(input.toCharArray(), pattern); } /** * Determines if the contents of a PatternMatcherInput, starting from the * current offset of the input contains a pattern. * If a pattern match is found, a MatchResult * instance representing the first such match is made acessible via * {@link #getMatch()}. The current offset of the * PatternMatcherInput is set to the offset corresponding to the end * of the match, so that a subsequent call to this method will continue * searching where the last call left off. You should remember that the * region between the begin and end offsets of the PatternMatcherInput are * considered the input to be searched, and that the current offset * of the PatternMatcherInput reflects where a search will start from. * Matches extending beyond the end offset of the PatternMatcherInput * will not be matched. In other words, a match must occur entirely * between the begin and end offsets of the input. See * {@link org.apache.oro.text.regex.PatternMatcherInput PatternMatcherInput} * for more details. *
* As a side effect, if a match is found, the PatternMatcherInput match * offset information is updated. See the PatternMatcherInput * {@link org.apache.oro.text.regex.PatternMatcherInput#setMatchOffsets * setMatchOffsets(int, int)} method for more details. *
* The pattern must be an AwkPattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * an AwkPattern as the pattern parameter. *
* This method is usually used in a loop as follows: *
** PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * PatternMatcherInput input; * MatchResult result; * * compiler = new AwkCompiler(); * matcher = new AwkMatcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.err.println("Bad pattern."); * System.err.println(e.getMessage()); * return; * } * * input = new PatternMatcherInput(someStringInput); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * } * *
* @param input The PatternMatcherInput to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean contains(PatternMatcherInput input, Pattern pattern) { __awkPattern = (AwkPattern)pattern; __scratchBuffer._buffer = input.getBuffer(); __scratchBuffer._bufferOffset = __beginOffset = input.getBeginOffset(); __lastMatchedBufferOffset = input.getCurrentOffset(); // Begin anchor requires match occur at beginning of input // No need to adjust current offset if no match found. if(__awkPattern._hasBeginAnchor) { if(__beginOffset != __lastMatchedBufferOffset || !__awkPattern._fastMap[__scratchBuffer._buffer[__beginOffset]]) { __lastMatchResult = null; return false; } } __scratchBuffer._bufferSize = input.length(); __scratchBuffer._endOfStreamReached = true; __streamSearchBuffer = __scratchBuffer; try { _search(); } catch(IOException e) { // do nothing } input.setCurrentOffset(__lastMatchedBufferOffset); if(__lastMatchResult == null) return false; input.setMatchOffsets(__lastMatchResult.beginOffset(0), __lastMatchResult.endOffset(0)); return true; } /** * Determines if the contents of an AwkStreamInput, starting from the * current offset of the input contains a pattern. * If a pattern match is found, a MatchResult * instance representing the first such match is made acessible via * {@link #getMatch()}. The current offset of the * input stream is advanced to the end offset corresponding to the end * of the match. Consequently a subsequent call to this method will continue * searching where the last call left off. * See {@link AwkStreamInput} for more details. *
* Note, patterns matching the null string do NOT match at end of input * stream. This is different from the behavior you get from the other * contains() methods. *
* The pattern must be an AwkPattern instance, otherwise a * ClassCastException will be thrown. You are not required to, and * indeed should NOT try to (for performance reasons), catch a * ClassCastException because it will never be thrown as long as you use * an AwkPattern as the pattern parameter. *
* This method is usually used in a loop as follows: *
** PatternMatcher matcher; * PatternCompiler compiler; * Pattern pattern; * AwkStreamInput input; * MatchResult result; * * compiler = new AwkCompiler(); * matcher = new AwkMatcher(); * * try { * pattern = compiler.compile(somePatternString); * } catch(MalformedPatternException e) { * System.err.println("Bad pattern."); * System.err.println(e.getMessage()); * return; * } * * input = new AwkStreamInput( * new BufferedInputStream(new FileInputStream(someFileName))); * * while(matcher.contains(input, pattern)) { * result = matcher.getMatch(); * // Perform whatever processing on the result you want. * } * *
* @param input The PatternStreamInput to test for a match. * @param pattern The Pattern to be matched. * @return True if the input contains a pattern match, false otherwise. * @exception ClassCastException If a Pattern instance other than an * AwkPattern is passed as the pattern parameter. */ public boolean contains(AwkStreamInput input, Pattern pattern) throws IOException { __awkPattern = (AwkPattern)pattern; // Begin anchor requires match occur at beginning of input if(__awkPattern._hasBeginAnchor) { // Do read here instead of in _search() so we can test first char if(input._bufferOffset == 0) { if(input.read() && !__awkPattern._fastMap[input._buffer[0]]) { __lastMatchResult = null; return false; } } else { __lastMatchResult = null; return false; } } __lastMatchedBufferOffset = input._currentOffset; __streamSearchBuffer = input; __beginOffset = 0; _search(); input._currentOffset = __lastMatchedBufferOffset; if(__lastMatchResult != null) { // Adjust match begin offset to be relative to beginning of stream. __lastMatchResult._incrementMatchBeginOffset(input._bufferOffset); return true; } return false; } private int __streamMatchPrefix() throws IOException { int token, current = AwkPattern._START_STATE, lastState; int offset, initialOffset, maxOffset; int lastMatchedOffset = -1; int[] tstateArray; offset = initialOffset = __offsets[0]; maxOffset = __streamSearchBuffer._bufferSize + __beginOffset; test: while(offset < maxOffset) { token = __streamSearchBuffer._buffer[offset++]; if(current < __awkPattern._numStates) { lastState = current; tstateArray = __awkPattern._getStateArray(current); current = tstateArray[token]; if(current == 0){ __awkPattern._createNewState(lastState, token, tstateArray); current = tstateArray[token]; } if(current == AwkPattern._INVALID_STATE){ break test; } else if(__awkPattern._endStates.get(current)){ lastMatchedOffset = offset; } if(offset == maxOffset){ offset = __streamSearchBuffer._reallocate(initialOffset) + __beginOffset; maxOffset = __streamSearchBuffer._bufferSize + __beginOffset; // If we're at the end of the stream, don't reset values if(offset != maxOffset){ if(lastMatchedOffset != -1) lastMatchedOffset-=initialOffset; initialOffset = 0; } } } else break; } __offsets[0] = initialOffset; __offsets[1] = lastMatchedOffset - 1; if(lastMatchedOffset == -1 && __awkPattern._matchesNullString) return 0; // End anchor requires match occur at end of input if(__awkPattern._hasEndAnchor && (!__streamSearchBuffer._endOfStreamReached || lastMatchedOffset < __streamSearchBuffer._bufferSize + __beginOffset)) return -1; return (lastMatchedOffset - initialOffset); } void _search() throws IOException { int position, tokensMatched; __lastMatchResult = null; while(true){ if(__lastMatchedBufferOffset >= __streamSearchBuffer._bufferSize + __beginOffset) { if(__streamSearchBuffer._endOfStreamReached){ // Get rid of reference now that it should no longer be used. __streamSearchBuffer = null; return; } else { if(!__streamSearchBuffer.read()) return; __lastMatchedBufferOffset = 0; } } for(position = __lastMatchedBufferOffset; position < __streamSearchBuffer._bufferSize + __beginOffset; position = __offsets[0] + 1) { __offsets[0] = position; if(__awkPattern._fastMap[__streamSearchBuffer._buffer[position]] && (tokensMatched = __streamMatchPrefix()) > -1) { __lastMatchResult = new AwkMatchResult( new String(__streamSearchBuffer._buffer, __offsets[0], tokensMatched), __offsets[0]); __lastMatchedBufferOffset = (tokensMatched > 0 ? __offsets[1] + 1 : __offsets[0] + 1); return; } else if(__awkPattern._matchesNullString) { __lastMatchResult = new AwkMatchResult(new String(), position); __lastMatchedBufferOffset = position + 1; return; } } __lastMatchedBufferOffset = position; } } /** * Fetches the last match found by a call to a matches() or contains() * method. *
* @return A MatchResult instance containing the pattern match found
* by the last call to any one of the matches() or contains()
* methods. If no match was found by the last call, returns
* null.
*/
public MatchResult getMatch() { return __lastMatchResult; }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/SyntaxNode.java 0000644 0001750 0001750 00000006570 07773723336 024316 0 ustar arnaud arnaud /*
* $Id: SyntaxNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param pos A single element array containing a variable representing
* the current position. It is made an array to cause it
* to be passed by reference to allow incrementing.
*/
abstract SyntaxNode _clone(int pos[]);
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/OrNode.java 0000644 0001750 0001750 00000007305 07773723336 023405 0 ustar arnaud arnaud /*
* $Id: OrNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param group The pattern subgroup. * @return The offset of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. */ public int beginOffset(int group){ return (group == 0 ? __matchBeginOffset : -1); } /** * Returns an offset marking the end of the pattern match * relative to the beginning of the input. *
* @param group The pattern subgroup.
* @return Returns one plus the offset of the last token in
* the indicated pattern subgroup. If a group was never matched
* or does not exist, returns -1. A group matching the null
* string will return its start offset.
*/
public int endOffset(int group){
return (group == 0 ? __matchBeginOffset + __length : -1);
}
/**
* The same as group(0).
*
* @return A string containing the entire match.
*/
public String toString() { return group(0); }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/PlusNode.java 0000644 0001750 0001750 00000005751 07773723336 023753 0 ustar arnaud arnaud /*
* $Id: PlusNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @return The original string representation of the regular expression * pattern. */ public String getPattern() { return _expression; } /** * This method returns an integer containing the compilation options used * to compile this pattern. *
* @return The compilation options used to compile the pattern.
*/
public int getOptions() { return _options; }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/EpsilonNode.java 0000644 0001750 0001750 00000006222 07773723336 024433 0 ustar arnaud arnaud /*
* $Id: EpsilonNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* The supported regular expression syntax is a superset of traditional AWK, * but NOT to be confused with GNU AWK or other AWK variants. Additionally, * this AWK implementation is DFA-based and only supports 8-bit ASCII. * Consequently, these classes can perform very fast pattern matches in * most cases. *
* This is the traditional Awk syntax that is supported: *
* This is the extended syntax that is supported: *
* @param pattern An Awk regular expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the regular expression. Currently the * only meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be an AwkPattern and can be reliably * be casted to an AwkPattern. * @exception MalformedPatternException If the compiled expression * is not a valid Awk regular expression. */ public Pattern compile(char[] pattern, int options) throws MalformedPatternException { SyntaxTree tree; AwkPattern regexp; __beginAnchor = __endAnchor = false; __caseSensitive = ((options & CASE_INSENSITIVE_MASK) == 0); __multiline = ((options & MULTILINE_MASK) != 0); tree = _parse(pattern); regexp = new AwkPattern(new String(pattern), tree); regexp._options = options; regexp._hasBeginAnchor = __beginAnchor; regexp._hasEndAnchor = __endAnchor; return regexp; } /** * Compiles an Awk regular expression into an AwkPattern instance that * can be used by an AwkMatcher object to perform pattern matching. *
* @param pattern An Awk regular expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the regular expression. Currently the * only meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be an AwkPattern and can be reliably * be casted to an AwkPattern. * @exception MalformedPatternException If the compiled expression * is not a valid Awk regular expression. */ public Pattern compile(String pattern, int options) throws MalformedPatternException { SyntaxTree tree; AwkPattern regexp; __beginAnchor = __endAnchor = false; __caseSensitive = ((options & CASE_INSENSITIVE_MASK) == 0); __multiline = ((options & MULTILINE_MASK) != 0); tree = _parse(pattern.toCharArray()); regexp = new AwkPattern(pattern, tree); regexp._options = options; regexp._hasBeginAnchor = __beginAnchor; regexp._hasEndAnchor = __endAnchor; return regexp; } /** * Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be an AwkPattern and can be reliably * be casted to an AwkPattern. * @exception MalformedPatternException If the compiled expression * is not a valid Awk regular expression. */ public Pattern compile(char[] pattern) throws MalformedPatternException { return compile(pattern, DEFAULT_MASK); } /** * Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile.
* @return A Pattern instance constituting the compiled regular expression.
* This instance will always be an AwkPattern and can be reliably
* be casted to an AwkPattern.
* @exception MalformedPatternException If the compiled expression
* is not a valid Awk regular expression.
*/
public Pattern compile(String pattern) throws MalformedPatternException {
return compile(pattern, DEFAULT_MASK);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/awk/TokenNode.java 0000644 0001750 0001750 00000006052 07773723336 024103 0 ustar arnaud arnaud /*
* $Id: TokenNode.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param message A message indicating the nature of the error. */ public MalformedPerl5PatternException(String message) { super(message); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/perl/package.html 0000644 0001750 0001750 00000000412 07773723336 024005 0 ustar arnaud arnaud
This package used to be the PerlTools library and adds Perl5 regular expression syntactic sugar built on top of the {@link org.apache.oro.text.regex} Perl5 regular expression classes. jakarta-oro-2.0.8/src/java/org/apache/oro/text/perl/Perl5Util.java 0000644 0001750 0001750 00000136020 07773723336 024221 0 ustar arnaud arnaud /* * $Id: Perl5Util.java,v 1.19 2003/11/07 20:16:25 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** The objective of the class is to minimize the amount of code a Java * programmer using Jakarta-ORO * has to write to achieve the same results as Perl by * transparently handling regular expression compilation, caching, and * matching. A second objective is to use the same Perl pattern matching * syntax to ease the task of Perl programmers transitioning to Java * (this also reduces the number of parameters to a method). * All the state affecting methods are synchronized to avoid * the maintenance of explicit locks in multithreaded programs. This * philosophy differs from the * {@link org.apache.oro.text.regex} package, where * you are expected to either maintain explicit locks, or more preferably * create separate compiler and matcher instances for each thread. *
* To use this class, first create an instance using the default constructor * or initialize the instance with a PatternCache of your choosing using * the alternate constructor. The default cache used by Perl5Util is a * PatternCacheLRU of capacity GenericPatternCache.DEFAULT_CAPACITY. You may * want to create a cache with a different capacity, a different * cache replacement policy, or even devise your own PatternCache * implementation. The PatternCacheLRU is probably the best general purpose * pattern cache, but your specific application may be better served by * a different cache replacement policy. You should remember that you can * front-load a cache with all the patterns you will be using before * initializing a Perl5Util instance, or you can just let Perl5Util * fill the cache as you use it. *
* You might use the class as follows: *
* Perl5Util util = new Perl5Util(); * String line; * DataInputStream input; * PrintStream output; * * // Initialization of input and output omitted * while((line = input.readLine()) != null) { * // First find the line with the string we want to substitute because * // it is cheaper than blindly substituting each line. * if(util.match("/HREF=\"description1.html\"/")) { * line = util.substitute("s/description1\\.html/about1.html/", line); * } * output.println(line); * } **
* A couple of things to remember when using this class are that the
* {@link #match match()} methods have the same meaning as
* {@link org.apache.oro.text.regex.Perl5Matcher#contains
* Perl5Matcher.contains()}
* and =~ m/pattern/
in Perl. The methods are named match
* to more closely associate them with Perl and to differentiate them
* from {@link org.apache.oro.text.regex.Perl5Matcher#matches
* Perl5Matcher.matches()}.
* A further thing to keep in mind is that the
* {@link MalformedPerl5PatternException} class is derived from
* RuntimeException which means you DON'T have to catch it. The reasoning
* behind this is that you will detect your regular expression mistakes
* as you write and debug your program when a MalformedPerl5PatternException
* is thrown during a test run. However, we STRONGLY recommend that you
* ALWAYS catch MalformedPerl5PatternException whenever you deal with a
* DYNAMICALLY created pattern. Relying on a fatal
* MalformedPerl5PatternException being thrown to detect errors while
* debugging is only useful for dealing with static patterns, that is, actual
* pregenerated strings present in your program. Patterns created from user
* input or some other dynamic method CANNOT be relied upon to be correct
* and MUST be handled by catching MalformedPerl5PatternException for your
* programs to be robust.
*
* Finally, as a convenience Perl5Util implements * the {@link org.apache.oro.text.regex.MatchResult MatchResult} interface. * The methods are merely wrappers which call the corresponding method of * the last {@link org.apache.oro.text.regex.MatchResult MatchResult} * found (which can be accessed with {@link #getMatch()}) by a match or * substitution (or even a split, but this isn't particularly useful). * At the moment, the * {@link org.apache.oro.text.regex.MatchResult MatchResult} returned * by {@link #getMatch()} is not stored in a thread-local variable. Therefore * concurrent calls to {@link #getMatch()} will produce unpredictable * results. So if your concurrent program requires the match results, * you must protect the matching and the result retrieval in a critical * section. If you do not need match results, you don't need to do anything * special. If you feel the J2SE implementation of {@link #getMatch()} * should use a thread-local variable and obviate the need for a critical * section, please express your views on the oro-dev mailing list. * * @version @version@ * @since 1.0 * @see MalformedPerl5PatternException * @see org.apache.oro.text.PatternCache * @see org.apache.oro.text.PatternCacheLRU * @see org.apache.oro.text.regex.MatchResult */ public final class Perl5Util implements MatchResult { /** The regular expression to use to parse match expression. */ private static final String __matchExpression = "m?(\\W)(.*)\\1([imsx]*)"; /** The pattern cache to compile and store patterns */ private PatternCache __patternCache; /** The hashtable to cache higher-level expressions */ private Cache __expressionCache; /** The pattern matcher to perform matching operations. */ private Perl5Matcher __matcher; /** The compiled match expression parsing regular expression. */ private Pattern __matchPattern; /** The last match from a successful call to a matching method. */ private MatchResult __lastMatch; /** * A container for temporarily holding the results of a split before * deleting trailing empty fields. */ private ArrayList __splitList; /** * Keeps track of the original input (for postMatch() and preMatch()) * methods. This will be discarded if the preMatch() and postMatch() * methods are moved into the MatchResult interface. */ private Object __originalInput; /** * Keeps track of the begin and end offsets of the original input for * the postMatch() and preMatch() methods. */ private int __inputBeginOffset, __inputEndOffset; /** Used for default return value of post and pre Match() */ private static final String __nullString = ""; /** * A constant passed to the {@link #split split()} methods indicating * that all occurrences of a pattern should be used to split a string. */ public static final int SPLIT_ALL = Util.SPLIT_ALL; /** * A secondary constructor for Perl5Util. It initializes the Perl5Matcher * used by the class to perform matching operations, but requires the * programmer to provide a PatternCache instance for the class * to use to compile and store regular expressions. You would want to * use this constructor if you want to change the capacity or policy * of the cache used. Example uses might be: *
* // We know we're going to use close to 50 expressions a whole lot, so * // we create a cache of the proper size. * util = new Perl5Util(new PatternCacheLRU(50)); ** or *
* // We're only going to use a few expressions and know that second-chance * // fifo is best suited to the order in which we are using the patterns. * util = new Perl5Util(new PatternCacheFIFO2(10)); **/ public Perl5Util(PatternCache cache) { __splitList = new ArrayList(); __matcher = new Perl5Matcher(); __patternCache = cache; __expressionCache = new CacheLRU(cache.capacity()); __compilePatterns(); } /** * Default constructor for Perl5Util. This initializes the Perl5Matcher * used by the class to perform matching operations and creates a * default PatternCacheLRU instance to use to compile and cache regular * expressions. The size of this cache is * GenericPatternCache.DEFAULT_CAPACITY. */ public Perl5Util() { this(new PatternCacheLRU()); } /** * Compiles the patterns (currently only the match expression) used to * parse Perl5 expressions. Right now it initializes __matchPattern. */ private void __compilePatterns() { Perl5Compiler compiler = new Perl5Compiler(); try { __matchPattern = compiler.compile(__matchExpression, Perl5Compiler.SINGLELINE_MASK); } catch(MalformedPatternException e) { // This should only happen during debugging. //e.printStackTrace(); throw new RuntimeException(e.getMessage()); } } /** * Parses a match expression and returns a compiled pattern. * First checks the expression cache and if the pattern is not found, * then parses the expression and fetches a compiled pattern from the * pattern cache. Otherwise, just uses the pattern found in the * expression cache. __matchPattern is used to parse the expression. *
* @param pattern The Perl5 match expression to parse. * @exception MalformedPerl5PatternException If there is an error parsing * the expression. */ private Pattern __parseMatchExpression(String pattern) throws MalformedPerl5PatternException { int index, compileOptions; String options, regex; MatchResult result; Object obj; Pattern ret; obj = __expressionCache.getElement(pattern); // Must catch ClassCastException because someone might incorrectly // pass an s/// expression. try block is cheaper than checking // instanceof try { if(obj != null) return (Pattern)obj; } catch(ClassCastException e) { // Fall through and parse expression } if(!__matcher.matches(pattern, __matchPattern)) throw new MalformedPerl5PatternException("Invalid expression: " + pattern); result = __matcher.getMatch(); regex = result.group(2); compileOptions = Perl5Compiler.DEFAULT_MASK; options = result.group(3); if(options != null) { index = options.length(); while(index-- > 0) { switch(options.charAt(index)) { case 'i' : compileOptions |= Perl5Compiler.CASE_INSENSITIVE_MASK; break; case 'm' : compileOptions |= Perl5Compiler.MULTILINE_MASK; break; case 's' : compileOptions |= Perl5Compiler.SINGLELINE_MASK; break; case 'x' : compileOptions |= Perl5Compiler.EXTENDED_MASK; break; default : throw new MalformedPerl5PatternException("Invalid options: " + options); } } } ret = __patternCache.getPattern(regex, compileOptions); __expressionCache.addElement(pattern, ret); return ret; } /** * Searches for the first pattern match somewhere in a character array * taking a pattern specified in Perl5 native format: *
* The* [m]/pattern/[i][m][s][x] *
m
prefix is optional and the meaning of the optional
* trailing options are:
* * If the input contains the pattern, the org.apache.oro.text.regex.MatchResult * can be obtained by calling {@link #getMatch()}. * However, Perl5Util implements the MatchResult interface as a wrapper * around the last MatchResult found, so you can call its methods to * access match information. *
* @param pattern The pattern to search for. * @param input The char[] input to search. * @return True if the input contains the pattern, false otherwise. * @exception MalformedPerl5PatternException If there is an error in * the pattern. You are not forced to catch this exception * because it is derived from RuntimeException. */ public synchronized boolean match(String pattern, char[] input) throws MalformedPerl5PatternException { boolean result; __parseMatchExpression(pattern); result = __matcher.contains(input, __parseMatchExpression(pattern)); if(result) { __lastMatch = __matcher.getMatch(); __originalInput = input; __inputBeginOffset = 0; __inputEndOffset = input.length; } return result; } /** * Searches for the first pattern match in a String taking * a pattern specified in Perl5 native format: *
* The* [m]/pattern/[i][m][s][x] *
m
prefix is optional and the meaning of the optional
* trailing options are:
* * If the input contains the pattern, the * {@link org.apache.oro.text.regex.MatchResult MatchResult} * can be obtained by calling {@link #getMatch()}. * However, Perl5Util implements the MatchResult interface as a wrapper * around the last MatchResult found, so you can call its methods to * access match information. *
* @param pattern The pattern to search for. * @param input The String input to search. * @return True if the input contains the pattern, false otherwise. * @exception MalformedPerl5PatternException If there is an error in * the pattern. You are not forced to catch this exception * because it is derived from RuntimeException. */ public synchronized boolean match(String pattern, String input) throws MalformedPerl5PatternException { return match(pattern, input.toCharArray()); } /** * Searches for the next pattern match somewhere in a * org.apache.oro.text.regex.PatternMatcherInput instance, taking * a pattern specified in Perl5 native format: *
* The* [m]/pattern/[i][m][s][x] *
m
prefix is optional and the meaning of the optional
* trailing options are:
* * If the input contains the pattern, the * {@link org.apache.oro.text.regex.MatchResult MatchResult} * can be obtained by calling {@link #getMatch()}. * However, Perl5Util implements the MatchResult interface as a wrapper * around the last MatchResult found, so you can call its methods to * access match information. * After the call to this method, the PatternMatcherInput current offset * is advanced to the end of the match, so you can use it to repeatedly * search for expressions in the entire input using a while loop as * explained in the {@link org.apache.oro.text.regex.PatternMatcherInput * PatternMatcherInput} documentation. *
* @param pattern The pattern to search for. * @param input The PatternMatcherInput to search. * @return True if the input contains the pattern, false otherwise. * @exception MalformedPerl5PatternException If there is an error in * the pattern. You are not forced to catch this exception * because it is derived from RuntimeException. */ public synchronized boolean match(String pattern, PatternMatcherInput input) throws MalformedPerl5PatternException { boolean result; result = __matcher.contains(input, __parseMatchExpression(pattern)); if(result) { __lastMatch = __matcher.getMatch(); __originalInput = input.getInput(); __inputBeginOffset = input.getBeginOffset(); __inputEndOffset = input.getEndOffset(); } return result; } /** * Returns the last match found by a call to a match(), substitute(), or * split() method. This method is only intended for use to retrieve a match * found by the last match found by a match() method. This method should * be used when you want to save MatchResult instances. Otherwise, for * simply accessing match information, it is more convenient to use the * Perl5Util methods implementing the MatchResult interface. *
* @return The org.apache.oro.text.regex.MatchResult instance containing the * last match found. */ public synchronized MatchResult getMatch() { return __lastMatch; } /** * Substitutes a pattern in a given input with a replacement string. * The substitution expression is specified in Perl5 native format: *
* The* s/pattern/replacement/[g][i][m][o][s][x] *
s
prefix is mandatory and the meaning of the optional
* trailing options are:
* * when you could more easily write: ** numSubs = util.substitute(result, "s/foo\\/bar/goo\\/\\/baz/", input); *
* where the hashmarks are used instead of slashes. ** numSubs = util.substitute(result, "s#foo/bar#goo//baz#", input); *
* There is a special case of backslashing that you need to pay attention * to. As demonstrated above, to denote a delimiter in the substituted * string it must be backslashed. However, this can be a problem * when you want to denote a backslash at the end of the substituted * string. As of PerlTools 1.3, a new means of handling this * situation has been implemented. * In previous versions, the behavior was that *
* "... a double backslash (quadrupled in the Java String) always * represents two backslashes unless the second backslash is followed * by the delimiter, in which case it represents a single backslash." **
* The new behavior is that a backslash is always a backslash * in the substitution portion of the expression unless it is used to * escape a delimiter. A backslash is considered to escape a delimiter * if an even number of contiguous backslashes preceed the backslash * and the delimiter following the backslash is not the FINAL delimiter * in the expression. Therefore, backslashes preceding final delimiters * are never considered to escape the delimiter. The following, which * used to be an invalid expression and require a special-case extra * backslash, will now replace all instances of / with \: *
** numSubs = util.substitute(result, "s#/#\\#g", input); *
* @param result The StringBuffer in which to store the result of the * substitutions. The buffer is only appended to. * @param expression The Perl5 substitution regular expression. * @param input The input on which to perform substitutions. * @return The number of substitutions made. * @exception MalformedPerl5PatternException If there is an error in * the expression. You are not forced to catch this exception * because it is derived from RuntimeException. * @since 2.0.6 */ // Expression parsing will have to be moved into a separate method if // there are going to be variations of this method. public synchronized int substitute(StringBuffer result, String expression, String input) throws MalformedPerl5PatternException { boolean backslash, finalDelimiter; int index, compileOptions, numSubstitutions, numInterpolations; int firstOffset, secondOffset, thirdOffset, subCount; StringBuffer replacement; Pattern compiledPattern; char exp[], delimiter; ParsedSubstitutionEntry entry; Perl5Substitution substitution; Object obj; obj = __expressionCache.getElement(expression); __nullTest: if(obj != null) { // Must catch ClassCastException because someone might incorrectly // pass an m// expression. try block is cheaper than checking // instanceof. We want to go ahead with parsing just in case so // we break. try { entry = (ParsedSubstitutionEntry)obj; } catch(ClassCastException e) { break __nullTest; } subCount = Util.substitute(result, __matcher, entry._pattern, entry._substitution, input, entry._numSubstitutions); __lastMatch = __matcher.getMatch(); return subCount; } exp = expression.toCharArray(); // Make sure basic conditions for a valid substitution expression hold. if(exp.length < 4 || exp[0] != 's' || Character.isLetterOrDigit(exp[1]) || exp[1] == '-') throw new MalformedPerl5PatternException("Invalid expression: " + expression); delimiter = exp[1]; firstOffset = 2; secondOffset = thirdOffset = -1; backslash = false; // Parse pattern for(index = firstOffset; index < exp.length; index++) { if(exp[index] == '\\') backslash = !backslash; else if(exp[index] == delimiter && !backslash) { secondOffset = index; break; } else if(backslash) backslash = !backslash; } if(secondOffset == -1 || secondOffset == exp.length - 1) throw new MalformedPerl5PatternException("Invalid expression: " + expression); // Parse replacement string backslash = false; finalDelimiter = true; replacement = new StringBuffer(exp.length - secondOffset); for(index = secondOffset + 1; index < exp.length; index++) { if(exp[index] == '\\') { backslash = !backslash; // 05/05/99 dfs // We unbackslash backslashed delimiters in the replacement string // only if we're on an odd backslash and there is another occurrence // of a delimiter later in the string. if(backslash && index + 1 < exp.length && exp[index + 1] == delimiter && expression.lastIndexOf(delimiter, exp.length - 1) != (index + 1)) { finalDelimiter = false; continue; } } else if(exp[index] == delimiter && finalDelimiter) { thirdOffset = index; break; } else { backslash = false; finalDelimiter = true; } replacement.append(exp[index]); } if(thirdOffset == -1) throw new MalformedPerl5PatternException("Invalid expression: " + expression); compileOptions = Perl5Compiler.DEFAULT_MASK; numSubstitutions = 1; // Single quotes cause no interpolations to be performed in replacement if(delimiter != '\'') numInterpolations = Perl5Substitution.INTERPOLATE_ALL; else numInterpolations = Perl5Substitution.INTERPOLATE_NONE; // Parse options for(index = thirdOffset + 1; index < exp.length; index++) { switch(exp[index]) { case 'i' : compileOptions |= Perl5Compiler.CASE_INSENSITIVE_MASK; break; case 'm' : compileOptions |= Perl5Compiler.MULTILINE_MASK; break; case 's' : compileOptions |= Perl5Compiler.SINGLELINE_MASK; break; case 'x' : compileOptions |= Perl5Compiler.EXTENDED_MASK; break; case 'g' : numSubstitutions = Util.SUBSTITUTE_ALL; break; case 'o' : numInterpolations = 1; break; default : throw new MalformedPerl5PatternException("Invalid option: " + exp[index]); } } compiledPattern = __patternCache.getPattern(new String(exp, firstOffset, secondOffset - firstOffset), compileOptions); substitution = new Perl5Substitution(replacement.toString(), numInterpolations); entry = new ParsedSubstitutionEntry(compiledPattern, substitution, numSubstitutions); __expressionCache.addElement(expression, entry); subCount = Util.substitute(result, __matcher, compiledPattern, substitution, input, numSubstitutions); __lastMatch = __matcher.getMatch(); return subCount; } /** * Substitutes a pattern in a given input with a replacement string. * The substitution expression is specified in Perl5 native format. *
** String result; * StringBuffer buffer = new StringBuffer(); * perl.substitute(buffer, expression, input); * result = buffer.toString(); *
* The* [m]/pattern/[i][m][s][x] *
m
prefix is optional and the meaning of the optional
* trailing options are:
* * The limit parameter causes the string to be split on at most the first * limit - 1 number of pattern occurences. *
* Of special note is that this split method performs EXACTLY the same * as the Perl split() function. In other words, if the split pattern * contains parentheses, additional Vector elements are created from * each of the matching subgroups in the pattern. Using an example * similar to the one from the Camel book: *
* produces the Vector containing: ** split(list, "/([,-])/", "8-12,15,18") *
* Furthermore, the following Perl behavior is observed: "leading empty * fields are preserved, and empty trailing one are deleted." This * has the effect that a split on a zero length string returns an empty * list. * The {@link org.apache.oro.text.regex.Util#split Util.split()} method * does NOT implement these behaviors because it is intended to * be a general self-consistent and predictable split function usable * with Pattern instances other than Perl5Pattern. ** { "8", "-", "12", ",", "15", ",", "18" } *
* @param results
* A Collection
to which the substrings of the input
* that occur between the regular expression delimiter occurences
* are appended. The input will not be split into any more substrings
* than the specified
* limit. A way of thinking of this is that only the first
* limit - 1
* matches of the delimiting regular expression will be used to split the
* input. The Collection must support the
* addAll(Collection)
operation.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String to split.
* @param limit The limit on the size of the returned Vector
.
* Values <= 0 produce the same behavior as the SPLIT_ALL constant which
* causes the limit to be ignored and splits to be performed on all
* occurrences of the pattern. You should use the SPLIT_ALL constant
* to achieve this behavior instead of relying on the default behavior
* associated with non-positive limit values.
* @exception MalformedPerl5PatternException If there is an error in
* the expression. You are not forced to catch this exception
* because it is derived from RuntimeException.
*/
public synchronized void split(Collection results, String pattern,
String input, int limit)
throws MalformedPerl5PatternException
{
int beginOffset, groups, index;
String group;
MatchResult currentResult = null;
PatternMatcherInput pinput;
Pattern compiledPattern;
compiledPattern = __parseMatchExpression(pattern);
pinput = new PatternMatcherInput(input);
beginOffset = 0;
while(--limit != 0 && __matcher.contains(pinput, compiledPattern)) {
currentResult = __matcher.getMatch();
__splitList.add(input.substring(beginOffset,
currentResult.beginOffset(0)));
if((groups = currentResult.groups()) > 1) {
for(index = 1; index < groups; ++index) {
group = currentResult.group(index);
if(group != null && group.length() > 0)
__splitList.add(group);
}
}
beginOffset = currentResult.endOffset(0);
}
__splitList.add(input.substring(beginOffset, input.length()));
// Remove all trailing empty fields.
for(int i = __splitList.size() - 1; i >= 0; --i) {
String str;
str = (String)__splitList.get(i);
if(str.length() == 0)
__splitList.remove(i);
else
break;
}
results.addAll(__splitList);
__splitList.clear();
// Just for the sake of completeness
__lastMatch = currentResult;
}
/**
* This method is identical to calling:
*
*/ public synchronized void split(Collection results, String pattern, String input) throws MalformedPerl5PatternException { split(results, pattern, input, SPLIT_ALL); } /** * Splits input in the default Perl manner, splitting on all whitespace. * This method is identical to calling: ** split(results, pattern, input, SPLIT_ALL); *
*/ public synchronized void split(Collection results, String input) throws MalformedPerl5PatternException { split(results, "/\\s+/", input); } /** * Splits a String into strings contained in a Vector of size no greater * than a specified limit. The String is split using a regular expression * as the delimiter. The regular expression is a pattern specified * in Perl5 native format: ** split(results, "/\\s+/", input); *
* The* [m]/pattern/[i][m][s][x] *
m
prefix is optional and the meaning of the optional
* trailing options are:
* * The limit parameter causes the string to be split on at most the first * limit - 1 number of pattern occurences. *
* Of special note is that this split method performs EXACTLY the same * as the Perl split() function. In other words, if the split pattern * contains parentheses, additional Vector elements are created from * each of the matching subgroups in the pattern. Using an example * similar to the one from the Camel book: *
* produces the Vector containing: ** split("/([,-])/", "8-12,15,18") *
* The {@link org.apache.oro.text.regex.Util#split Util.split()} method * does NOT implement this particular behavior because it is intended to * be usable with Pattern instances other than Perl5Pattern. ** { "8", "-", "12", ",", "15", ",", "18" } *
* @deprecated Use
* {@link #split(Collection results, String pattern, String input, int limit)}
* instead.
* @param pattern The regular expression to use as a split delimiter.
* @param input The String to split.
* @param limit The limit on the size of the returned Vector
.
* Values <= 0 produce the same behavior as the SPLIT_ALL constant which
* causes the limit to be ignored and splits to be performed on all
* occurrences of the pattern. You should use the SPLIT_ALL constant
* to achieve this behavior instead of relying on the default behavior
* associated with non-positive limit values.
* @return A Vector
containing the substrings of the input
* that occur between the regular expression delimiter occurences. The
* input will not be split into any more substrings than the specified
* limit. A way of thinking of this is that only the first
* limit - 1
* matches of the delimiting regular expression will be used to split the
* input.
* @exception MalformedPerl5PatternException If there is an error in
* the expression. You are not forced to catch this exception
* because it is derived from RuntimeException.
*/
public synchronized Vector split(String pattern, String input, int limit)
throws MalformedPerl5PatternException
{
Vector results = new Vector(20);
split(results, pattern, input, limit);
return results;
}
/**
* This method is identical to calling:
*
* @deprecated Use * {@link #split(Collection results, String pattern, String input)} instead. */ public synchronized Vector split(String pattern, String input) throws MalformedPerl5PatternException { return split(pattern, input, SPLIT_ALL); } /** * Splits input in the default Perl manner, splitting on all whitespace. * This method is identical to calling: ** split(pattern, input, SPLIT_ALL); *
* @deprecated Use * {@link #split(Collection results, String input)} instead. */ public synchronized Vector split(String input) throws MalformedPerl5PatternException { return split("/\\s+/", input); } // // MatchResult interface methods. // /** * Returns the length of the last match found. ** split("/\\s+/", input); *
* @return The length of the last match found. */ public synchronized int length() { return __lastMatch.length(); } /** * @return The number of groups contained in the last match found. * This number includes the 0th group. In other words, the * result refers to the number of parenthesized subgroups plus * the entire match itself. */ public synchronized int groups() { return __lastMatch.groups(); } /** * Returns the contents of the parenthesized subgroups of the last match * found according to the behavior dictated by the MatchResult interface. *
* @param group The pattern subgroup to return. * @return A string containing the indicated pattern subgroup. Group * 0 always refers to the entire match. If a group was never * matched, it returns null. This is not to be confused with * a group matching the null string, which will return a String * of length 0. */ public synchronized String group(int group) { return __lastMatch.group(group); } /** * Returns the begin offset of the subgroup of the last match found * relative the beginning of the match. *
* @param group The pattern subgroup. * @return The offset into group 0 of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. Be aware that a group that matches * the null string at the end of a match will have an offset * equal to the length of the string, so you shouldn't blindly * use the offset to index an array or String. */ public synchronized int begin(int group) { return __lastMatch.begin(group); } /** * Returns the end offset of the subgroup of the last match found * relative the beginning of the match. *
* @param group The pattern subgroup. * @return Returns one plus the offset into group 0 of the last token in * the indicated pattern subgroup. If a group was never matched * or does not exist, returns -1. A group matching the null * string will return its start offset. */ public synchronized int end(int group) { return __lastMatch.end(group); } /** * Returns an offset marking the beginning of the last pattern match * found relative to the beginning of the input from which the match * was extracted. *
* @param group The pattern subgroup. * @return The offset of the first token in the indicated * pattern subgroup. If a group was never matched or does * not exist, returns -1. */ public synchronized int beginOffset(int group) { return __lastMatch.beginOffset(group); } /** * Returns an offset marking the end of the last pattern match found * relative to the beginning of the input from which the match was * extracted. *
* @param group The pattern subgroup. * @return Returns one plus the offset of the last token in * the indicated pattern subgroup. If a group was never matched * or does not exist, returns -1. A group matching the null * string will return its start offset. */ public synchronized int endOffset(int group) { return __lastMatch.endOffset(group); } /** * Returns the same as group(0). *
* @return A string containing the entire match. */ public synchronized String toString() { if(__lastMatch == null) return null; return __lastMatch.toString(); } /** * Returns the part of the input preceding the last match found. *
* @return The part of the input following the last match found. */ public synchronized String preMatch() { int begin; if(__originalInput == null) return __nullString; begin = __lastMatch.beginOffset(0); if(begin <= 0) return __nullString; if(__originalInput instanceof char[]) { char[] input; input = (char[])__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(begin > input.length) begin = input.length; return new String(input, __inputBeginOffset, begin); } else if(__originalInput instanceof String) { String input; input = (String)__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(begin > input.length()) begin = input.length(); return input.substring(__inputBeginOffset, begin); } return __nullString; } /** * Returns the part of the input following the last match found. *
* @return The part of the input following the last match found. */ public synchronized String postMatch() { int end; if(__originalInput == null) return __nullString; end = __lastMatch.endOffset(0); if(end < 0) return __nullString; if(__originalInput instanceof char[]) { char[] input; input = (char[])__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(end >= input.length) return __nullString; return new String(input, end, __inputEndOffset - end); } else if(__originalInput instanceof String) { String input; input = (String)__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(end >= input.length()) return __nullString; return input.substring(end, __inputEndOffset); } return __nullString; } /** * Returns the part of the input preceding the last match found as a * char array. This method eliminates the extra * buffer copying caused by preMatch().toCharArray(). *
* @return The part of the input preceding the last match found as a char[]. * If the result is of zero length, returns null instead of a zero * length array. */ public synchronized char[] preMatchCharArray() { int begin; char[] result = null; if(__originalInput == null) return null; begin = __lastMatch.beginOffset(0); if(begin <= 0) return null; if(__originalInput instanceof char[]) { char[] input; input = (char[])__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(begin >= input.length) begin = input.length; result = new char[begin - __inputBeginOffset]; System.arraycopy(input, __inputBeginOffset, result, 0, result.length); } else if(__originalInput instanceof String) { String input; input = (String)__originalInput; // Just in case we make sure begin offset is in bounds. It should // be but we're paranoid. if(begin >= input.length()) begin = input.length(); result = new char[begin - __inputBeginOffset]; input.getChars(__inputBeginOffset, begin, result, 0); } return result; } /** * Returns the part of the input following the last match found as a char * array. This method eliminates the extra buffer copying caused by * preMatch().toCharArray(). *
* @return The part of the input following the last match found as a char[].
* If the result is of zero length, returns null instead of a zero
* length array.
*/
public synchronized char[] postMatchCharArray() {
int end;
char[] result = null;
if(__originalInput == null)
return null;
end = __lastMatch.endOffset(0);
if(end < 0)
return null;
if(__originalInput instanceof char[]) {
int length;
char[] input;
input = (char[])__originalInput;
// Just in case we make sure begin offset is in bounds. It should
// be but we're paranoid.
if(end >= input.length)
return null;
length = __inputEndOffset - end;
result = new char[length];
System.arraycopy(input, end, result, 0, length);
} else if(__originalInput instanceof String) {
String input;
input = (String)__originalInput;
// Just in case we make sure begin offset is in bounds. It should
// be but we're paranoid.
if(end >= __inputEndOffset)
return null;
result = new char[__inputEndOffset - end];
input.getChars(end, __inputEndOffset, result, 0);
}
return result;
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/perl/ParsedSubstitutionEntry.java 0000644 0001750 0001750 00000006236 07773723336 027276 0 ustar arnaud arnaud /*
* $Id: ParsedSubstitutionEntry.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* To completely understand how to use MatchActionProcessor, you should first * look at {@link MatchAction} and {@link MatchActionInfo}. * A MatchActionProcessor is first initialized with * the desired PatternCompiler and PatternMatcher instances to use to compile * patterns and perform matches. Then, optionally, a field separator may * be registered with {@link #setFieldSeparator setFieldSeparator()} * Finally, as many pattern action pairs as desired are registerd with * {@link #addAction addAction()} before processing the input * with {@link #processMatches processMatches()}. Pattern action * pairs are processed in the order they were registered. *
* The look of added actions can closely mirror that of AWK when anonymous * classes are used. Here's an example of how you might use * MatchActionProcessor to extract only the second column of a semicolon * delimited file: *
*
* import java.io.*; * * import org.apache.oro.text.*; * import org.apache.oro.text.regex.*; * * public final class semicolon { * * public static final void main(String[] args) { * MatchActionProcessor processor = new MatchActionProcessor(); * * try { * processor.setFieldSeparator(";"); * // Using a null pattern means to perform the action for every line. * processor.addAction(null, new MatchAction() { * public void processMatch(MatchActionInfo info) { * // We assume the second column exists * info.output.println(info.fields.elementAt(1)); * } * }); * } catch(MalformedPatternException e) { * e.printStackTrace(); * System.exit(1); * } * * try { * processor.processMatches(System.in, System.out); * } catch(IOException e) { * e.printStackTrace(); * System.exit(1); * } * } *} ** You can redirect the following sample input to stdin to test the code: *
* 1;Trenton;New Jersey * 2;Annapolis;Maryland * 3;Austin;Texas * 4;Richmond;Virginia * 5;Harrisburg;Pennsylvania * 6;Honolulu;Hawaii * 7;Santa Fe;New Mexico ** * @version @version@ * @since 1.0 * @see MatchAction * @see MatchActionInfo */ public final class MatchActionProcessor { private Pattern __fieldSeparator = null; private PatternCompiler __compiler; private PatternMatcher __matcher; // If a pattern is null, it means to do it for every line. private Vector __patterns = new Vector(); private Vector __actions = new Vector(); private MatchAction __defaultAction = new DefaultMatchAction(); /** * Creates a new MatchActionProcessor instance initialized with the specified * pattern compiler and matcher. The field separator is set to null by * default, which means that matched lines will not be split into separate * fields unless the field separator is set with * {@link #setFieldSeparator setFieldSeparator()}. *
* @param compiler The PatternCompiler to use to compile registered * patterns. * @param matcher The PatternMatcher to use when searching for matches. */ public MatchActionProcessor(PatternCompiler compiler, PatternMatcher matcher) { __compiler = compiler; __matcher = matcher; } /** * Default constructor for MatchActionProcessor. Same as calling *
* MatchActionProcessor(new Perl5Compiler(), new Perl5Matcher());
*
*/
public MatchActionProcessor() {
this(new Perl5Compiler(), new Perl5Matcher());
}
/**
* Registers a pattern action pair, providing options to be used to
* compile the pattern. If a pattern is null, the action
* is performed for every line of input.
* * @param pattern The pattern to bind to an action. * @param options The compilation options to use for the pattern. * @param action The action to associate with the pattern. * @exception MalformedPatternException If the pattern cannot be compiled. */ public void addAction(String pattern, int options, MatchAction action) throws MalformedPatternException { if(pattern != null) __patterns.addElement(__compiler.compile(pattern, options)); else __patterns.addElement(null); __actions.addElement(action); } /** * Binds a patten to the default action, providing options to be * used to compile the pattern. The default action is to simply print * the matched line to the output. If a pattern is null, the action * is performed for every line of input. *
* @param pattern The pattern to bind to an action. * @param options The compilation options to use for the pattern. * @exception MalformedPatternException If the pattern cannot be compiled. */ public void addAction(String pattern, int options) throws MalformedPatternException { addAction(pattern, options, __defaultAction); } /** * Binds a patten to the default action. The default action is to simply * print the matched line to the output. If a pattern is null, the action * is performed for every line of input. *
* @param pattern The pattern to bind to an action. * @exception MalformedPatternException If the pattern cannot be compiled. */ public void addAction(String pattern) throws MalformedPatternException { addAction(pattern, 0); } /** * Registers a pattern action pair. If a pattern is null, the action * is performed for every line of input. *
* @param pattern The pattern to bind to an action. * @param action The action to associate with the pattern. * @exception MalformedPatternException If the pattern cannot be compiled. */ public void addAction(String pattern, MatchAction action) throws MalformedPatternException { addAction(pattern, 0, action); } /** * Sets the field separator to use when splitting a line into fields. * If the field separator is never set, or set to null, matched input * lines are not split into fields. *
* @param separator A regular expression defining the field separator. * @param options The options to use when compiling the separator. * @exception MalformedPatternException If the separator cannot be compiled. */ public void setFieldSeparator(String separator, int options) throws MalformedPatternException { if(separator == null) { __fieldSeparator = null; return; } __fieldSeparator = __compiler.compile(separator, options); } /** * Sets the field separator to use when splitting a line into fields. * If the field separator is never set, or set to null, matched input * lines are not split into fields. *
* @param separator A regular expression defining the field separator. * @exception MalformedPatternException If the separator cannot be compiled. */ public void setFieldSeparator(String separator) throws MalformedPatternException { setFieldSeparator(separator, 0); } /** * This method reads the provided input one line at a time and for * every registered pattern that is contained in the line it executes * the associated MatchAction's processMatch() method. If a field * separator has been defined with * {@link #setFieldSeparator setFieldSeparator()}, the * fields member of the MatchActionInfo instance passed to the * processMatch() method is set to a Vector of Strings containing * the split fields of the line. Otherwise the fields member is set * to null. If no match was performed to invoke the action (i.e., * a null pattern was registered), then the match member is set * to null. Otherwise, the match member will contain the result of * the match. *
* The input stream, having been exhausted, is closed right before the * method terminates and the output stream is flushed. *
* @see MatchActionInfo * @param input The input stream from which to read lines. * @param output Where to send output. * @param encoding The character encoding of the InputStream source. * If you also want to define an output character encoding, * you should use {@link #processMatches(Reader, Writer)} * and specify the encodings when creating the Reader and * Writer sources and sinks. * @exception IOException If an error occurs while reading input * or writing output. */ public void processMatches(InputStream input, OutputStream output, String encoding) throws IOException { processMatches(new InputStreamReader(input, encoding), new OutputStreamWriter(output)); } /** * This method reads the provided input one line at a time using the * platform standart character encoding and for every registered * pattern that is contained in the line it executes the associated * MatchAction's processMatch() method. If a field separator has been * defined with {@link #setFieldSeparator setFieldSeparator()}, the * fields member of the MatchActionInfo instance passed to the * processMatch() method is set to a Vector of Strings containing * the split fields of the line. Otherwise the fields member is set * to null. If no match was performed to invoke the action (i.e., * a null pattern was registered), then the match member is set * to null. Otherwise, the match member will contain the result of * the match. * *
* The input stream, having been exhausted, is closed right before the * method terminates and the output stream is flushed. *
* * @see MatchActionInfo * @param input The input stream from which to read lines. * @param output Where to send output. * @exception IOException If an error occurs while reading input * or writing output. */ public void processMatches(InputStream input, OutputStream output) throws IOException { processMatches(new InputStreamReader(input), new OutputStreamWriter(output)); } /** * This method reads the provided input one line at a time and for * every registered pattern that is contained in the line it executes * the associated MatchAction's processMatch() method. If a field * separator has been defined with * {@link #setFieldSeparator setFieldSeparator()}, the * fields member of the MatchActionInfo instance passed to the * processMatch() method is set to a Vector of Strings containing * the split fields of the line. Otherwise the fields member is set * to null. If no match was performed to invoke the action (i.e., * a null pattern was registered), then the match member is set * to null. Otherwise, the match member will contain the result of * the match. *
* The input stream, having been exhausted, is closed right before the * method terminates and the output stream is flushed. *
* @see MatchActionInfo
* @param input The input stream from which to read lines.
* @param output Where to send output.
* @exception IOException If an error occurs while reading input
* or writing output.
*/
public void processMatches(Reader input, Writer output)
throws IOException
{
int patternCount, current;
LineNumberReader reader = new LineNumberReader(input);
PrintWriter writer = new PrintWriter(output);
MatchActionInfo info = new MatchActionInfo();
Object obj;
Pattern pattern;
MatchAction action;
List fields = new ArrayList();
// Set those things that will not change.
info.matcher = __matcher;
info.fieldSeparator = __fieldSeparator;
info.input = reader;
info.output = writer;
info.fields = null;
patternCount = __patterns.size();
info.lineNumber = 0;
while((info.line = reader.readLine()) != null) {
info.charLine = info.line.toCharArray();
for(current=0; current < patternCount; current++) {
obj = __patterns.elementAt(current);
// If a pattern is null, it means to do it for every line.
if(obj != null) {
pattern = (Pattern)__patterns.elementAt(current);
if(__matcher.contains(info.charLine, pattern)) {
info.match = __matcher.getMatch();
info.lineNumber = reader.getLineNumber();
info.pattern = pattern;
if(__fieldSeparator != null) {
fields.clear();
Util.split(fields, __matcher, __fieldSeparator, info.line);
info.fields = fields;
} else
info.fields = null;
action = (MatchAction)__actions.elementAt(current);
action.processMatch(info);
}
} else {
info.match = null;
info.lineNumber = reader.getLineNumber();
if(__fieldSeparator != null) {
fields.clear();
Util.split(fields, __matcher, __fieldSeparator, info.line);
info.fields = fields;
} else
info.fields = null;
action = (MatchAction)__actions.elementAt(current);
action.processMatch(info);
}
}
}
// Flush output but don't close, close input since we reached end.
writer.flush();
reader.close();
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/GlobCompiler.java 0000644 0001750 0001750 00000035065 07773723336 024017 0 ustar arnaud arnaud /*
* $Id: GlobCompiler.java,v 1.8 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* Because there are various similar glob expression syntaxes, GlobCompiler * tries to provide a small amount of customization by providing the * {@link #STAR_CANNOT_MATCH_NULL_MASK} * and {@link #QUESTION_MATCHES_ZERO_OR_ONE_MASK} compilation options. *
* The GlobCompiler expression syntax is based on Unix shell glob expressions * but should be usable to simulate Win32 wildcards. The following syntax * is supported: *
* Please remember that the when you construct a Java string in Java code, * the backslash character is itself a special Java character, and it must * be double backslashed to represent single backslash in a regular * expression. * * @version @version@ * @since 1.0 * @see org.apache.oro.text.regex.PatternCompiler * @see org.apache.oro.text.regex.Perl5Matcher */ public final class GlobCompiler implements PatternCompiler { /** * The default mask for the {@link #compile compile} methods. * It is equal to 0. The default behavior is for a glob expression to * be case sensitive unless it is compiled with the CASE_INSENSITIVE_MASK * option. */ public static final int DEFAULT_MASK = 0; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate a compiled glob expression should be case insensitive. */ public static final int CASE_INSENSITIVE_MASK = 0x0001; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate that a * should not be allowed to match the null string. * The normal behavior of the * metacharacter is that it may match any * 0 or more characters. This mask causes it to match 1 or more * characters of anything. */ public static final int STAR_CANNOT_MATCH_NULL_MASK = 0x0002; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate that a ? should not be allowed to match the null string. * The normal behavior of the ? metacharacter is that it may match any 1 * character. This mask causes it to match 0 or 1 characters. */ public static final int QUESTION_MATCHES_ZERO_OR_ONE_MASK = 0x0004; /** * A mask passed as an option to the {@link #compile compile} methods * to indicate that the resulting Perl5Pattern should be treated as a * read only data structure by Perl5Matcher, making it safe to share * a single Perl5Pattern instance among multiple threads without needing * synchronization. Without this option, Perl5Matcher reserves the right * to store heuristic or other information in Perl5Pattern that might * accelerate future matches. When you use this option, Perl5Matcher will * not store or modify any information in a Perl5Pattern. Use this option * when you want to share a Perl5Pattern instance among multiple threads * using different Perl5Matcher instances. */ public static final int READ_ONLY_MASK = 0x0008; private Perl5Compiler __perl5Compiler; private static boolean __isPerl5MetaCharacter(char ch) { return (ch == '*' || ch == '?' || ch == '+' || ch == '[' || ch == ']' || ch == '(' || ch == ')' || ch == '|' || ch == '^' || ch == '$' || ch == '.' || ch == '{' || ch == '}' || ch == '\\'); } private static boolean __isGlobMetaCharacter(char ch) { return (ch == '*' || ch == '?' || ch == '[' || ch == ']'); } /** * This static method is the basic engine of the Glob PatternCompiler * implementation. It takes a glob expression in the form of a character * array and converts it into a String representation of a Perl5 pattern. * The method is made public so that programmers may use it for their * own purposes. However, the GlobCompiler compile methods work by * converting the glob pattern to a Perl5 pattern using this method, and * then invoking the compile() method of an internally stored Perl5Compiler * instance. *
* @param pattern A character array representation of a Glob pattern. * @return A String representation of a Perl5 pattern equivalent to the * Glob pattern. */ public static String globToPerl5(char[] pattern, int options) { boolean inCharSet, starCannotMatchNull = false, questionMatchesZero; int ch; StringBuffer buffer; buffer = new StringBuffer(2*pattern.length); inCharSet = false; questionMatchesZero = ((options & QUESTION_MATCHES_ZERO_OR_ONE_MASK) != 0); starCannotMatchNull = ((options & STAR_CANNOT_MATCH_NULL_MASK) != 0); for(ch=0; ch < pattern.length; ch++) { switch(pattern[ch]) { case '*': if(inCharSet) buffer.append('*'); else { if(starCannotMatchNull) buffer.append(".+"); else buffer.append(".*"); } break; case '?': if(inCharSet) buffer.append('?'); else { if(questionMatchesZero) buffer.append(".?"); else buffer.append('.'); } break; case '[': inCharSet = true; buffer.append(pattern[ch]); if(ch + 1 < pattern.length) { switch(pattern[ch + 1]) { case '!': case '^': buffer.append('^'); ++ch; continue; case ']': buffer.append(']'); ++ch; continue; } } break; case ']': inCharSet = false; buffer.append(pattern[ch]); break; case '\\': buffer.append('\\'); if(ch == pattern.length - 1) { buffer.append('\\'); } else if(__isGlobMetaCharacter(pattern[ch + 1])) buffer.append(pattern[++ch]); else buffer.append('\\'); break; default: if(!inCharSet && __isPerl5MetaCharacter(pattern[ch])) buffer.append('\\'); buffer.append(pattern[ch]); break; } } return buffer.toString(); } /** * The default GlobCompiler constructor. It initializes an internal * Perl5Compiler instance to compile translated glob expressions. */ public GlobCompiler() { __perl5Compiler = new Perl5Compiler(); } /** * Compiles a Glob expression into a Perl5Pattern instance that * can be used by a Perl5Matcher object to perform pattern matching. *
* @param pattern A Glob expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the glob expression. The flags * are a logical OR of any number of the 3 MASK * constants. For example: *
* regex = * compiler.compile(pattern, GlobCompiler. * CASE_INSENSITIVE_MASK | * GlobCompiler.STAR_CANNOT_MATCH_NULL_MASK); ** This says to compile the pattern so that * * cannot match the null string and to perform * matches in a case insensitive manner. * @return A Pattern instance constituting the compiled expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Glob expression. */ public Pattern compile(char[] pattern, int options) throws MalformedPatternException { int perlOptions = 0; if((options & CASE_INSENSITIVE_MASK) != 0) perlOptions |= Perl5Compiler.CASE_INSENSITIVE_MASK; if((options & READ_ONLY_MASK) != 0) perlOptions |= Perl5Compiler.READ_ONLY_MASK; return __perl5Compiler.compile(globToPerl5(pattern, options), perlOptions); } /** * Same as calling compile(pattern, GlobCompiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Glob expression. */ public Pattern compile(char[] pattern) throws MalformedPatternException { return compile(pattern, DEFAULT_MASK); } /** * Same as calling compile(pattern, GlobCompiler.DEFAULT_MASK); *
* @param pattern A regular expression to compile. * @return A Pattern instance constituting the compiled regular expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Glob expression. */ public Pattern compile(String pattern) throws MalformedPatternException { return compile(pattern.toCharArray(), DEFAULT_MASK); } /** * Compiles a Glob expression into a Perl5Pattern instance that * can be used by a Perl5Matcher object to perform pattern matching. *
* @param pattern A Glob expression to compile. * @param options A set of flags giving the compiler instructions on * how to treat the glob expression. The flags * are a logical OR of any number of the 3 MASK * constants. For example: *
* regex = * compiler.compile("*.*", GlobCompiler. * CASE_INSENSITIVE_MASK | * GlobCompiler.STAR_CANNOT_MATCH_NULL_MASK); ** This says to compile the pattern so that * * cannot match the null string and to perform * matches in a case insensitive manner. * @return A Pattern instance constituting the compiled expression. * This instance will always be a Perl5Pattern and can be reliably * casted to a Perl5Pattern. * @exception MalformedPatternException If the compiled expression * is not a valid Glob expression. */ public Pattern compile(String pattern, int options) throws MalformedPatternException { return compile(pattern.toCharArray(), options); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/PatternCacheRandom.java 0000644 0001750 0001750 00000010554 07773723336 025137 0 ustar arnaud arnaud /* * $Id: PatternCacheRandom.java,v 1.7 2003/11/07 20:16:24 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see *
* @param capacity The capacity of the cache. * @param compiler The PatternCompiler to use to compile patterns. */ public PatternCacheRandom(int capacity, PatternCompiler compiler) { super(new CacheRandom(capacity), compiler); } /** * Same as: *
*/ public PatternCacheRandom(PatternCompiler compiler) { this(GenericPatternCache.DEFAULT_CAPACITY, compiler); } /** * Same as: ** PatternCacheRandom(GenericPatternCache.DEFAULT_CAPACITY, compiler); *
*/ public PatternCacheRandom(int capacity) { this(capacity, new Perl5Compiler()); } /** * Same as: ** PatternCacheRandom(capacity, new Perl5Compiler()); *
*/ public PatternCacheRandom() { this(GenericPatternCache.DEFAULT_CAPACITY); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/GenericPatternCache.java 0000644 0001750 0001750 00000020635 07773723336 025274 0 ustar arnaud arnaud /* * $Id: GenericPatternCache.java,v 1.7 2003/11/07 20:16:24 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** PatternCacheRandom(GenericPatternCache.DEFAULT_CAPACITY); *
* @param cache The cache with which to store patterns. * @param compiler The PatternCompiler that should be used to compile * patterns. */ GenericPatternCache(Cache cache, PatternCompiler compiler) { _cache = cache; _compiler = compiler; } /** * Adds a pattern to the cache and returns the compiled pattern. This * method is in principle almost identical to * {@link #getPattern getPattern()} except for the fact that * it throws a MalformedPatternException if an expression cannot be * compiled. *
* addPattern() is meant to be used when you expressly intend to add * an expression to the cache and is useful for front-loading a cache * with expressions before use. If the expression added does not * already exist in the cache, it is compiled, added to the cache, * and returned. If the compiled expression is already in the cache, it * is simply returned. *
* The expected behavior of this method should be to start replacing * patterns in the cache only after the cache has been filled to capacity. *
* @param expression The regular expression to add to the cache. * @param options The compilation options to use when compiling the * expression. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedPatternException If there is an error in compiling * the regular expression. */ public final synchronized Pattern addPattern(String expression, int options) throws MalformedPatternException { Object obj; Pattern pattern; obj = _cache.getElement(expression); if(obj != null) { pattern = (Pattern)obj; if(pattern.getOptions() == options) return pattern; } pattern = _compiler.compile(expression, options); _cache.addElement(expression, pattern); return pattern; } /** * Same as calling *
* @exception MalformedPatternException If there is an error in compiling * the regular expression. */ public final synchronized Pattern addPattern(String expression) throws MalformedPatternException { return addPattern(expression, 0); } /** * This method fetches a pattern from the cache. It is nearly identical * to {@link #addPattern addPattern()} except that it doesn't * throw a MalformedPatternException. If the pattern is not in the * cache, it is compiled, placed in the cache, and returned. If * the pattern cannot be compiled successfully, it * throws a MalformedCachePatternException. * Note that this exception is derived from RuntimeException, which means * you are NOT forced to catch it by the compiler. Please refer to * {@link MalformedCachePatternException} for a discussion of * when you should and shouldn't catch this exception. ** addPattern(expression, 0); *
* @param expression The regular expression to fetch from the cache in * compiled form. * @param options The compilation options to use when compiling the * expression. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedCachePatternException If there is an error in * compiling the regular expression. */ public final synchronized Pattern getPattern(String expression, int options) throws MalformedCachePatternException { Pattern result = null; try { result = addPattern(expression, options); } catch(MalformedPatternException e) { throw new MalformedCachePatternException("Invalid expression: " + expression + "\n" + e.getMessage()); } return result; } /** * Same as calling *
*/ public final synchronized Pattern getPattern(String expression) throws MalformedCachePatternException { return getPattern(expression, 0); } /** * Returns the number of elements in the cache, not to be confused with * the {@link #capacity()} which returns the number * of elements that can be held in the cache at one time. ** getPattern(expression, 0) *
* @return The current size of the cache (i.e., the number of elements * currently cached). */ public final int size() { return _cache.size(); } /** * Returns the maximum number of patterns that can be cached at one time. *
* @return The maximum number of patterns that can be cached at one time.
*/
public final int capacity() { return _cache.capacity(); }
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/PatternCacheFIFO2.java 0000644 0001750 0001750 00000011437 07773723336 024525 0 ustar arnaud arnaud /*
* $Id: PatternCacheFIFO2.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The capacity of the cache. * @param compiler The PatternCompiler to use to compile patterns. */ public PatternCacheFIFO2(int capacity, PatternCompiler compiler) { super(new CacheFIFO2(capacity), compiler); } /** * Same as: *
*/ public PatternCacheFIFO2(PatternCompiler compiler) { this(GenericPatternCache.DEFAULT_CAPACITY, compiler); } /** * Same as: ** PatternCacheFIFO2(GenericPatternCache.DEFAULT_CAPACITY, compiler); *
*/ public PatternCacheFIFO2(int capacity) { this(capacity, new Perl5Compiler()); } /** * Same as: ** PatternCacheFIFO2(capacity, new Perl5Compiler()); *
*/ public PatternCacheFIFO2() { this(GenericPatternCache.DEFAULT_CAPACITY); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/package.html 0000644 0001750 0001750 00000000416 07773723336 023047 0 ustar arnaud arnaud This package used to be the TextTools library and provides general text processing support, including a glob regular expression class, pattern caching and line-by-line processing classes. jakarta-oro-2.0.8/src/java/org/apache/oro/text/MalformedCachePatternException.java 0000644 0001750 0001750 00000010212 07773723336 027473 0 ustar arnaud arnaud /* * $Id: MalformedCachePatternException.java,v 1.7 2003/11/07 20:16:24 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** PatternCacheFIFO2(GenericPatternCache.DEFAULT_CAPACITY); *
* @param message A message indicating the nature of the error.
*/
public MalformedCachePatternException(String message) {
super(message);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/PatternCacheLRU.java 0000644 0001750 0001750 00000010677 07773723336 024367 0 ustar arnaud arnaud /*
* $Id: PatternCacheLRU.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The capacity of the cache. * @param compiler The PatternCompiler to use to compile patterns. */ public PatternCacheLRU(int capacity, PatternCompiler compiler) { super(new CacheLRU(capacity), compiler); } /** * Same as: *
*/ public PatternCacheLRU(PatternCompiler compiler) { this(GenericPatternCache.DEFAULT_CAPACITY, compiler); } /** * Same as: ** PatternCacheLRU(GenericPatternCache.DEFAULT_CAPACITY, compiler); *
*/ public PatternCacheLRU(int capacity) { this(capacity, new Perl5Compiler()); } /** * Same as: ** PatternCacheLRU(capacity, new Perl5Compiler()); *
*/ public PatternCacheLRU() { this(GenericPatternCache.DEFAULT_CAPACITY); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/PatternCacheFIFO.java 0000644 0001750 0001750 00000010576 07773723336 024446 0 ustar arnaud arnaud /* * $Id: PatternCacheFIFO.java,v 1.7 2003/11/07 20:16:24 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** PatternCacheLRU(GenericPatternCache.DEFAULT_CAPACITY); *
* @param capacity The capacity of the cache. * @param compiler The PatternCompiler to use to compile patterns. */ public PatternCacheFIFO(int capacity, PatternCompiler compiler) { super(new CacheFIFO(capacity), compiler); } /** * Same as: *
*/ public PatternCacheFIFO(PatternCompiler compiler) { this(GenericPatternCache.DEFAULT_CAPACITY, compiler); } /** * Same as: ** PatternCacheFIFO(GenericPatternCache.DEFAULT_CAPACITY, compiler); *
*/ public PatternCacheFIFO(int capacity) { this(capacity, new Perl5Compiler()); } /** * Same as: ** PatternCacheFIFO(capacity, new Perl5Compiler()); *
*/ public PatternCacheFIFO() { this(GenericPatternCache.DEFAULT_CAPACITY); } } jakarta-oro-2.0.8/src/java/org/apache/oro/text/DefaultMatchAction.java 0000644 0001750 0001750 00000006101 07773723336 025125 0 ustar arnaud arnaud /* * $Id: DefaultMatchAction.java,v 1.7 2003/11/07 20:16:24 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** PatternCacheFIFO(GenericPatternCache.DEFAULT_CAPACITY); *
* A PatternCache is an object that takes care of compiling, storing, and * retrieving regular expressions so that the programmer does not have to * explicitly manage these operation himself. The main benefit derived * is the ease of use from only having to express regular expressions * by their String representations. * * @version @version@ * @since 1.0 * @see MalformedCachePatternException */ public interface PatternCache { /** * Adds a pattern to the cache and returns the compiled pattern. This * method is in principle almost identical to * {@link #getPattern(String)} except for the fact that * it throws a MalformedPatternException if an expression cannot be * compiled. *
* addPattern() is meant to be used when you expressly intend to add * an expression to a cache and is useful for front-loading a cache * with expressions before use. If the expression added does not * already exist in the cache, it is compiled, added to the cache, * and returned. If the compiled expression is already in the cache, it * is simply returned. *
* The expected behavior of this method should be to start replacing * patterns in the cache only after the cache has been filled to capacity. *
* @param expression The regular expression to add to the cache. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedPatternException If there is an error in compiling * the regular expression. */ public Pattern addPattern(String expression) throws MalformedPatternException; /** * Adds a pattern to the cache and returns the compiled pattern. This * method is in principle almost identical to * {@link #getPattern(String)} except for the fact that * it throws a MalformedPatternException if an expression cannot be * compiled. *
* addPattern() is meant to be used when you expressly intend to add * an expression to the cache and is useful for front-loading a cache * with expressions before use. If the expression added does not * already exist in the cache, it is compiled, added to the cache, * and returned. If the compiled expression is already in the cache, it * is simply returned. *
* The expected behavior of this method should be to start replacing * patterns in the cache only after the cache has been filled to capacity. *
* @param expression The regular expression to add to the cache. * @param options The compilation options to use when compiling the * expression. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedPatternException If there is an error in compiling * the regular expression. */ public Pattern addPattern(String expression, int options) throws MalformedPatternException; /** * This method fetches a pattern from the cache. It is nearly identical * to {@link #addPattern addPattern()} except that it doesn't * throw a MalformedPatternException. If the pattern is not in the * cache, it is compiled, placed in the cache, and returned. If * the pattern cannot be compiled successfully, the implementation must * throw an exception derived from MalformedCachePatternException. * Note that this exception is derived from RuntimeException, which means * you are NOT forced to catch it by the compiler. Please refer to * {@link MalformedCachePatternException} for a discussion of when you * should and shouldn't catch this exception. *
* @param expression The regular expression to fetch from the cache in * compiled form. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedCachePatternException If there is an error in * compiling the regular expression. */ public Pattern getPattern(String expression) throws MalformedCachePatternException; /** * This method fetches a pattern from the cache. It is nearly identical * to {@link #addPattern addPattern()} except that it doesn't * throw a MalformedPatternException. If the pattern is not in the * cache, it is compiled, placed in the cache, and returned. If * the pattern cannot be compiled successfully, it * throws a MalformedCachePatternException. * Note that this exception is derived from RuntimeException, which means * you are NOT forced to catch it by the compiler. Please refer to * {@link MalformedCachePatternException} for a discussion of when you * should and shouldn't catch this exception. *
* @param expression The regular expression to fetch from the cache in * compiled form. * @param options The compilation options to use when compiling the * expression. * @return The Pattern corresponding to the String representation of the * regular expression. * @exception MalformedCachePatternException If there is an error in * compiling the regular expression. */ public Pattern getPattern(String expression, int options) throws MalformedCachePatternException; /** * Returns the number of elements in the cache, not to be confused with * the {@link #capacity()} which returns the number * of elements that can be held in the cache at one time. *
* @return The current size of the cache (i.e., the number of elements * currently cached). */ public int size(); /** * Returns the maximum number of patterns that can be cached at one time. *
* @return The maximum number of patterns that can be cached at one time.
*/
public int capacity();
}
jakarta-oro-2.0.8/src/java/org/apache/oro/text/MatchAction.java 0000644 0001750 0001750 00000007136 07773723336 023631 0 ustar arnaud arnaud /*
* $Id: MatchAction.java,v 1.7 2003/11/07 20:16:24 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @see MatchActionProcessor
* @see MatchActionInfo
* @param matchInfo The match information associated with the line
* matched by MatchActionProcessor.
*/
public void processMatch(MatchActionInfo matchInfo);
}
jakarta-oro-2.0.8/src/java/org/apache/oro/io/ 0000755 0001750 0001750 00000000000 10423237774 020200 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/io/RegexFilenameFilter.java 0000644 0001750 0001750 00000014213 07773723336 024735 0 ustar arnaud arnaud /*
* $Id: RegexFilenameFilter.java,v 1.9 2003/11/07 20:16:23 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param regex The regular expression on which to filter. * @exception MalformedCachePatternException If there is an error in * compiling the regular expression. This need not be caught if * you are using a hard-coded expression that you know is correct. * But for robustness and reliability you should catch this exception * for dynamically entered expressions determined at runtime. */ public void setFilterExpression(String regex) throws MalformedCachePatternException { _pattern = _cache.getPattern(regex); } /** * Set the regular expression on which to filter along with any * special options to use when compiling the expression. *
* @param regex The regular expression on which to filter. * @param options A set of compilation options specific to the regular * expression grammar being used. * @exception MalformedCachePatternException If there is an error in * compiling the regular expression. This need not be caught if * you are using a hard-coded expression that you know is correct. * But for robustness and reliability you should catch this exception * for dynamically entered expressions determined at runtime. */ public void setFilterExpression(String regex, int options) throws MalformedCachePatternException { _pattern = _cache.getPattern(regex, options); } /** * Filters a filename. Tests if the filename EXACTLY matches the pattern * contained by the filter. The directory argument is not examined. * Conforms to the java.io.FilenameFilter interface. *
* @param dir The directory containing the file. * @param filename The name of the file. * @return True if the filename EXACTLY matches the pattern, false if not. */ public boolean accept(File dir, String filename) { synchronized(_matcher) { return _matcher.matches(filename, _pattern); } } /** * Filters a filename. Tests if the filename EXACTLY matches the pattern * contained by the filter. The filename is defined as pathname.getName(). * Conforms to the java.io.FileFilter interface. *
* @param pathname The file pathname. * @return True if the filename EXACTLY matches the pattern, false if not. */ public boolean accept(File pathname) { synchronized(_matcher) { return _matcher.matches(pathname.getName(), _pattern); } } } jakarta-oro-2.0.8/src/java/org/apache/oro/io/package.html 0000644 0001750 0001750 00000000372 07773723336 022473 0 ustar arnaud arnaud
This package provides FilenameFilters that filter based on a regular expression and other I/O-related classes that derive their functionality from regular expressions. jakarta-oro-2.0.8/src/java/org/apache/oro/io/AwkFilenameFilter.java 0000644 0001750 0001750 00000010674 07773723336 024414 0 ustar arnaud arnaud /* * $Id: AwkFilenameFilter.java,v 1.7 2003/11/07 20:16:23 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see * org.apache.oro.text.awk.AwkCompiler
*
* @param regex The regular expression on which to filter.
* @param options A set of compilation options.
* @exception MalformedCachePatternException If there is an error in
* compiling the regular expression. This need not be caught if
* you are using a hard-coded expression that you know is correct.
* But for robustness and reliability you should catch this exception
* for dynamically entered expressions determined at runtime.
*/
public AwkFilenameFilter(String regex, int options) {
super(__CACHE, __MATCHER, regex, options);
}
/** Same as AwkFilenameFilter(regex, AwkCompiler.DEFAULT_MASK); */
public AwkFilenameFilter(String regex) {
super(__CACHE, __MATCHER, regex);
}
/** Same as AwkFilenameFilter(""); */
public AwkFilenameFilter() {
super(__CACHE, __MATCHER);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/io/GlobFilenameFilter.java 0000644 0001750 0001750 00000010637 07773723336 024554 0 ustar arnaud arnaud /*
* $Id: GlobFilenameFilter.java,v 1.7 2003/11/07 20:16:23 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
* org.apache.oro.text.GlobCompiler
*
* @param regex The regular expression on which to filter.
* @param options A set of compilation options.
* @exception MalformedCachePatternException If there is an error in
* compiling the regular expression. This need not be caught if
* you are using a hard-coded expression that you know is correct.
* But for robustness and reliability you should catch this exception
* for dynamically entered expressions determined at runtime.
*/
public GlobFilenameFilter(String regex, int options) {
super(__CACHE, __MATCHER, regex, options);
}
/** Same as GlobFilenameFilter(regex, GlobCompiler.DEFAULT_MASK); */
public GlobFilenameFilter(String regex) {
super(__CACHE, __MATCHER, regex);
}
/** Same as GlobFilenameFilter(""); */
public GlobFilenameFilter() {
super(__CACHE, __MATCHER);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/io/Perl5FilenameFilter.java 0000644 0001750 0001750 00000010613 07773723336 024652 0 ustar arnaud arnaud /*
* $Id: Perl5FilenameFilter.java,v 1.7 2003/11/07 20:16:23 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
* org.apache.oro.text.regex.Perl5Compiler
*
* @param regex The regular expression on which to filter.
* @param options A set of compilation options.
* @exception MalformedCachePatternException If there is an error in
* compiling the regular expression. This need not be caught if
* you are using a hard-coded expression that you know is correct.
* But for robustness and reliability you should catch this exception
* for dynamically entered expressions determined at runtime.
*/
public Perl5FilenameFilter(String regex, int options) {
super(__CACHE, __MATCHER, regex, options);
}
/** Same as Perl5FilenameFilter(regex, Perl5Compiler.DEFAULT_MASK); */
public Perl5FilenameFilter(String regex) {
super(__CACHE, __MATCHER, regex);
}
/** Same as Perl5FilenameFilter(""); */
public Perl5FilenameFilter() {
super(__CACHE, __MATCHER);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/util/ 0000755 0001750 0001750 00000000000 10423237774 020546 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/org/apache/oro/util/CacheFIFO2.java 0000644 0001750 0001750 00000013370 07773723336 023156 0 ustar arnaud arnaud /*
* $Id: CacheFIFO2.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The capacity of the cache. */ public CacheFIFO2(int capacity) { super(capacity); __tryAgain = new boolean[_cache.length]; } /** * Same as: *
*/ public CacheFIFO2(){ this(GenericCache.DEFAULT_CAPACITY); } public synchronized Object getElement(Object key) { Object obj; obj = _table.get(key); if(obj != null) { GenericCacheEntry entry; entry = (GenericCacheEntry)obj; __tryAgain[entry._index] = true; return entry._value; } return null; } /** * Adds a value to the cache. If the cache is full, when a new value * is added to the cache, it replaces the first of the current values * in the cache to have been added (i.e., FIFO2). ** CacheFIFO2(GenericCache.DEFAULT_CAPACITY); *
* @param key The key referencing the value added to the cache.
* @param value The value to add to the cache.
*/
public final synchronized void addElement(Object key, Object value) {
int index;
Object obj;
obj = _table.get(key);
if(obj != null) {
GenericCacheEntry entry;
// Just replace the value. Technically this upsets the FIFO2 ordering,
// but it's expedient.
entry = (GenericCacheEntry)obj;
entry._value = value;
entry._key = key;
// Set the try again value to compensate.
__tryAgain[entry._index] = true;
return;
}
// If we haven't filled the cache yet, put it at the end.
if(!isFull()) {
index = _numEntries;
++_numEntries;
} else {
// Otherwise, find the next slot that doesn't have a second chance.
index = __current;
while(__tryAgain[index]) {
__tryAgain[index] = false;
if(++index >= __tryAgain.length)
index = 0;
}
__current = index + 1;
if(__current >= _cache.length)
__current = 0;
_table.remove(_cache[index]._key);
}
_cache[index]._value = value;
_cache[index]._key = key;
_table.put(key, _cache[index]);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/util/CacheFIFO.java 0000644 0001750 0001750 00000011450 07773723336 023071 0 ustar arnaud arnaud /*
* $Id: CacheFIFO.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The capacity of the cache. */ public CacheFIFO(int capacity) { super(capacity); } /** * Same as: *
*/ public CacheFIFO(){ this(GenericCache.DEFAULT_CAPACITY); } /** * Adds a value to the cache. If the cache is full, when a new value * is added to the cache, it replaces the first of the current values * in the cache to have been added (i.e., FIFO). ** CacheFIFO(GenericCache.DEFAULT_CAPACITY); *
* @param key The key referencing the value added to the cache.
* @param value The value to add to the cache.
*/
public final synchronized void addElement(Object key, Object value) {
int index;
Object obj;
obj = _table.get(key);
if(obj != null) {
GenericCacheEntry entry;
// Just replace the value. Technically this upsets the FIFO ordering,
// but it's expedient.
entry = (GenericCacheEntry)obj;
entry._value = value;
entry._key = key;
return;
}
// If we haven't filled the cache yet, put it at the end.
if(!isFull()) {
index = _numEntries;
++_numEntries;
} else {
// Otherwise, replace the current pointer, which takes care of
// FIFO in a circular fashion.
index = __curent;
if(++__curent >= _cache.length)
__curent = 0;
_table.remove(_cache[index]._key);
}
_cache[index]._value = value;
_cache[index]._key = key;
_table.put(key, _cache[index]);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/util/Cache.java 0000644 0001750 0001750 00000006727 07773723336 022440 0 ustar arnaud arnaud /*
* $Id: Cache.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @return The current size of the cache (i.e., the number of elements * currently cached). */ public int size(); /** * Returns the maximum number of elements that can be cached at one time. *
* @return The maximum number of elements that can be cached at one time. */ public int capacity(); } jakarta-oro-2.0.8/src/java/org/apache/oro/util/package.html 0000644 0001750 0001750 00000000351 07773723336 023036 0 ustar arnaud arnaud
This package includes general classes required by {@link org.apache.oro.text} and related packages, but that can also be applied to more general uses. jakarta-oro-2.0.8/src/java/org/apache/oro/util/GenericCacheEntry.java 0000644 0001750 0001750 00000006363 07773723336 024753 0 ustar arnaud arnaud /* * $Id: GenericCacheEntry.java,v 1.7 2003/11/07 20:16:25 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see ** @param capacity The capacity of the cache. */ public CacheLRU(int capacity) { super(capacity); int i; __next = new int[_cache.length]; __prev = new int[_cache.length]; for(i=0; i < __next.length; i++) __next[i] = __prev[i] = -1; } /** * Same as: *
*/ public CacheLRU(){ this(GenericCache.DEFAULT_CAPACITY); } private void __moveToFront(int index) { int next, prev; if(__head != index) { next = __next[index]; prev = __prev[index]; // Only the head has a prev entry that is an invalid index so // we don't check. __next[prev] = next; // Make sure index is valid. If it isn't, we're at the tail // and don't set __prev[next]. if(next >= 0) __prev[next] = prev; else __tail = prev; __prev[index] = -1; __next[index] = __head; __prev[__head] = index; __head = index; } } public synchronized Object getElement(Object key) { Object obj; obj = _table.get(key); if(obj != null) { GenericCacheEntry entry; entry = (GenericCacheEntry)obj; // Maintain LRU property __moveToFront(entry._index); return entry._value; } return null; } /** * Adds a value to the cache. If the cache is full, when a new value * is added to the cache, it replaces the least recently used value * in the cache (i.e., LRU). ** CacheLRU(GenericCache.DEFAULT_CAPACITY); *
* @param key The key referencing the value added to the cache.
* @param value The value to add to the cache.
*/
public final synchronized void addElement(Object key, Object value) {
Object obj;
obj = _table.get(key);
if(obj != null) {
GenericCacheEntry entry;
// Just replace the value, but move it to the front.
entry = (GenericCacheEntry)obj;
entry._value = value;
entry._key = key;
__moveToFront(entry._index);
return;
}
// If we haven't filled the cache yet, place in next available spot
// and move to front.
if(!isFull()) {
if(_numEntries > 0) {
__prev[_numEntries] = __tail;
__next[_numEntries] = -1;
__moveToFront(_numEntries);
}
++_numEntries;
} else {
// We replace the tail of the list.
_table.remove(_cache[__tail]._key);
__moveToFront(__tail);
}
_cache[__head]._value = value;
_cache[__head]._key = key;
_table.put(key, _cache[__head]);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/util/CacheRandom.java 0000644 0001750 0001750 00000011273 07773723336 023571 0 ustar arnaud arnaud /*
* $Id: CacheRandom.java,v 1.7 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The capacity of the cache. */ public CacheRandom(int capacity) { super(capacity); __random = new Random(System.currentTimeMillis()); } /** * Same as: *
*/ public CacheRandom(){ this(GenericCache.DEFAULT_CAPACITY); } /** * Adds a value to the cache. If the cache is full, when a new value * is added to the cache, it replaces the first of the current values * in the cache to have been added (i.e., Random). ** CacheRandom(GenericCache.DEFAULT_CAPACITY); *
* @param key The key referencing the value added to the cache.
* @param value The value to add to the cache.
*/
public final synchronized void addElement(Object key, Object value) {
int index;
Object obj;
obj = _table.get(key);
if(obj != null) {
GenericCacheEntry entry;
// Just replace the value.
entry = (GenericCacheEntry)obj;
entry._value = value;
entry._key = key;
return;
}
// Expression is not in cache.
// If we haven't filled the cache yet, put it at the end.
if(!isFull()) {
index = _numEntries;
++_numEntries;
} else {
// Otherwise, replace a random entry.
index = (int)(_cache.length*__random.nextFloat());
_table.remove(_cache[index]._key);
}
_cache[index]._value = value;
_cache[index]._key = key;
_table.put(key, _cache[index]);
}
}
jakarta-oro-2.0.8/src/java/org/apache/oro/util/GenericCache.java 0000644 0001750 0001750 00000012240 07773723336 023720 0 ustar arnaud arnaud /*
* $Id: GenericCache.java,v 1.8 2003/11/07 20:16:25 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param capacity The maximum capacity of the cache. */ GenericCache(int capacity) { _numEntries = 0; _table = new HashMap(capacity); _cache = new GenericCacheEntry[capacity]; while(--capacity >= 0) _cache[capacity] = new GenericCacheEntry(capacity); } public abstract void addElement(Object key, Object value); public synchronized Object getElement(Object key) { Object obj; obj = _table.get(key); if(obj != null) return ((GenericCacheEntry)obj)._value; return null; } public final Iterator keys() { return _table.keySet().iterator(); } /** * Returns the number of elements in the cache, not to be confused with * the {@link #capacity()} which returns the number * of elements that can be held in the cache at one time. *
* @return The current size of the cache (i.e., the number of elements * currently cached). */ public final int size() { return _numEntries; } /** * Returns the maximum number of elements that can be cached at one time. *
* @return The maximum number of elements that can be cached at one time. */ public final int capacity() { return _cache.length; } public final boolean isFull() { return (_numEntries >= _cache.length); } } jakarta-oro-2.0.8/src/java/org/apache/oro/overview.html 0000644 0001750 0001750 00000000773 07773723336 022344 0 ustar arnaud arnaud
The Jakarta-ORO library contains packages for performing general text processing in Java, with an aim to support, though not specifically limited to, servlet development. The core package is {@link org.apache.oro.text.regex}, which defines abstract interfaces for manipulating regular expressions, as well as a set of Perl5 comptabile regular expression classes. Developers will mostly be interested only in that package. jakarta-oro-2.0.8/src/java/examples/ 0000755 0001750 0001750 00000000000 10423237774 016600 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/examples/awk/ 0000755 0001750 0001750 00000000000 10423237774 017362 5 ustar arnaud arnaud jakarta-oro-2.0.8/src/java/examples/awk/streamInputExample.txt 0000644 0001750 0001750 00000001454 07773723336 023766 0 ustar arnaud arnaud Many programmers believe C++ is too complicated for its own good and prefer to avoid its more obscure and confusing features. In fact, some programmers are so fed up with the language that they will only program in Java, even though Java is still very immature and dog-slow. That is not to say that Java is necessarily a better language than C++, but rather that Java simply has a stronger appeal to the tired C++ programmer. C++ is an object-oriented descendent of C. Being derived from C gave it one marvelous feature that Java lacks: the C preprocessor. C++ programmers that have converted to Java are banging their heads against their keyboards because they do not have a true conditional compilation mechanism. Of course, the lack of enumerations is also a great pain, although tolerable to some. jakarta-oro-2.0.8/src/java/examples/awk/splitExample.java 0000644 0001750 0001750 00000012325 07773723336 022707 0 ustar arnaud arnaud /* * $Id: splitExample.java,v 1.8 2003/11/07 20:16:23 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see *strings
command,
* but is intended to show how matching on a stream is affected by its
* character encoding. The most important thing to remember is that
* AwkMatcher only matches on 8-bit values. If your input contains
* Java characters containing values greater than 255, the pattern
* matching process will result in an ArrayIndexOutOfBoundsException.
* Therefore, if you want to search a binary file containing arbitrary
* bytes, you have to make sure you use an 8-bit character encoding
* like ISO-8859-1, so that the mapping between byte-values and character
* values will be one to one. Otherwise, the file will be interpreted
* as UTF-8 by default, and you will probably wind up with character
* values outside of the 8-bit range.
*
* @version @version@
*/
public final class strings {
public static final class StringFinder {
/**
* Default string expression. Looks for at least 4 contiguous
* printable characters. Differs slightly from GNU strings command
* in that any printable character may start a string.
*/
public static final String DEFAULT_PATTERN =
"[\\x20-\\x7E]{3}[\\x20-\\x7E]+";
Pattern pattern;
AwkMatcher matcher;
public StringFinder(String regex) throws MalformedPatternException {
AwkCompiler compiler = new AwkCompiler();
pattern = compiler.compile(regex, AwkCompiler.CASE_INSENSITIVE_MASK);
matcher = new AwkMatcher();
}
public StringFinder() throws MalformedPatternException {
this(DEFAULT_PATTERN);
}
public void search(Reader input, PrintWriter output) throws IOException {
MatchResult result;
AwkStreamInput in = new AwkStreamInput(input);
while(matcher.contains(in, pattern)) {
result = matcher.getMatch();
output.println(result);
}
output.flush();
}
}
public static final String DEFAULT_ENCODING = "ISO-8859-1";
public static final void main(String args[]) {
String regex = StringFinder.DEFAULT_PATTERN;
String filename, encoding = DEFAULT_ENCODING;
StringFinder finder;
Reader file = null;
// Some users thought it would be useful to use the default pattern
// and just pass the encoding as the second parameter. Therefore,
// when two arguments are given and the second argument is not a valid
// encoding, it is interpreted as a pattern. This means you can't
// use a valid encoding name as a pattern without also specifying
// an encoding as a third argument.
if(args.length < 1) {
System.err.println("usage: strings file [pattern|encoding] [encoding]");
return;
} else if(args.length > 2) {
regex = args[1];
encoding = args[2];
} else if(args.length > 1)
encoding = args[1];
filename = args[0];
try {
InputStream fin = new FileInputStream(filename);
try {
file = new InputStreamReader(fin, encoding);
} catch(UnsupportedEncodingException uee) {
if(args.length == 2) {
regex = encoding;
encoding = DEFAULT_ENCODING;
file = new InputStreamReader(fin, encoding);
} else
throw uee;
}
finder = new StringFinder(regex);
finder.search(file, new PrintWriter(new OutputStreamWriter(System.out)));
file.close();
} catch(Exception e) {
e.printStackTrace();
return;
}
}
}
jakarta-oro-2.0.8/src/java/examples/matchResultExample.java 0000644 0001750 0001750 00000014756 07773723336 023277 0 ustar arnaud arnaud /*
* $Id: matchResultExample.java,v 1.7 2003/11/07 20:16:23 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* @param args[] The array of arguments to the program. The first
* argument should be a Perl5 regular expression, and the second
* should be an input string.
*/
public static final void main(String args[]) {
int groups;
PatternMatcher matcher;
PatternCompiler compiler;
Pattern pattern = null;
PatternMatcherInput input;
MatchResult result;
// Must have at least two arguments, else exit.
if(args.length < 2) {
System.err.println("Usage: matchResult pattern input");
return;
}
// Create Perl5Compiler and Perl5Matcher instances.
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
// Attempt to compile the pattern. If the pattern is not valid,
// report the error and exit.
try {
pattern = compiler.compile(args[0]);
} catch(MalformedPatternException e) {
System.err.println("Bad pattern.");
System.err.println(e.getMessage());
return;
}
// Create a PatternMatcherInput instance to keep track of the position
// where the last match finished, so that the next match search will
// start from there. You always create a PatternMatcherInput instance
// when you want to search a string for all of the matches it contains,
// and not just the first one.
input = new PatternMatcherInput(args[1]);
// Loop until there are no more matches left.
while(matcher.contains(input, pattern)) {
// Since we're still in the loop, fetch match that was found.
result = matcher.getMatch();
// Perform whatever processing on the result you want.
// Here we just print out all its elements to show how the
// MatchResult methods are used.
// The toString() method is provided as a convenience method.
// It returns the entire match. The following are all equivalent:
// System.out.println("Match: " + result);
// System.out.println("Match: " + result.toString());
// System.out.println("Match: " + result.group(0));
System.out.println("Match: " + result.toString());
// Print the length of the match. The length() method is another
// convenience method. The lengths of subgroups can be obtained
// by first retrieving the subgroup and then calling the string's
// length() method.
System.out.println("Length: " + result.length());
// Retrieve the number of matched groups. A group corresponds to
// a parenthesized set in a pattern.
groups = result.groups();
System.out.println("Groups: " + groups);
// Print the offset into the input of the beginning and end of the
// match. The beinOffset() and endOffset() methods return the
// offsets of a group relative to the beginning of the input. The
// begin() and end() methods return the offsets of a group relative
// the to the beginning of a match.
System.out.println("Begin offset: " + result.beginOffset(0));
System.out.println("End offset: " + result.endOffset(0));
System.out.println("Groups: ");
// Print the contents of each matched subgroup along with their
// offsets relative to the beginning of the entire match.
// Start at 1 because we just printed out group 0
for(int group = 1; group < groups; group++) {
System.out.println(group + ": " + result.group(group));
System.out.println("Begin: " + result.begin(group));
System.out.println("End: " + result.end(group));
}
}
}
}
jakarta-oro-2.0.8/src/java/examples/filter.java 0000644 0001750 0001750 00000007534 07773723336 020751 0 ustar arnaud arnaud /*
* $Id: filter.java,v 1.8 2003/11/07 20:16:23 dfs Exp $
*
* ====================================================================
* The Apache Software License, Version 1.1
*
* Copyright (c) 2000 The Apache Software Foundation. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. The end-user documentation included with the redistribution,
* if any, must include the following acknowledgment:
* "This product includes software developed by the
* Apache Software Foundation (http://www.apache.org/)."
* Alternately, this acknowledgment may appear in the software itself,
* if and wherever such third-party acknowledgments normally appear.
*
* 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro"
* must not be used to endorse or promote products derived from this
* software without prior written permission. For written
* permission, please contact apache@apache.org.
*
* 5. Products derived from this software may not be called "Apache"
* or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their
* name, without prior written permission of the Apache Software Foundation.
*
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
* ====================================================================
*
* This software consists of voluntary contributions made by many
* individuals on behalf of the Apache Software Foundation. For more
* information on the Apache Software Foundation, please see
*
* This is a simple program that takes a javadoc generated HTML file as * input and produces as output the same HTML file, except with a white * background color for the body. *
*
* #!/usr/bin/perl * * $#ARGV >= 1 || die "Usage: jdfix input output\n"; * * open(INPUT, $ARGV[0]) || warn "Couldn't open $ARGV[0]\n"; * open(OUTPUT, ">$ARGV[1]") || warn "Couldn't open $ARGV[1]\n"; * * while(){ * s///; * print OUTPUT; * } * * close(INPUT); * close(OUTPUT); **/ public static final void main(String args[]) { String line; BufferedReader input = null; PrintWriter output = null; Perl5Util perl; StringBuffer result = new StringBuffer(); int numSubs = 0; if(args.length < 2) { System.err.println("Usage: jdfix input output"); return; } try { input = new BufferedReader(new FileReader(args[0])); } catch(IOException e) { System.err.println("Error opening input file: " + args[0]); e.printStackTrace(); return; } try { output = new PrintWriter(new FileWriter(args[1])); } catch(IOException e) { System.err.println("Error opening output file: " + args[1]); e.printStackTrace(); return; } perl = new Perl5Util(); try { while((line = input.readLine()) != null) { numSubs+=perl.substitute(result, "s///", line); result.append('\n'); } output.print(result.toString()); System.out.println("Substitutions made: " + numSubs); } catch(IOException e) { System.err.println("Error reading from input: " + args[1]); e.printStackTrace(); return; } finally { try { input.close(); output.close(); } catch(IOException e) { System.err.println("Error closing files."); e.printStackTrace(); return; } } } } jakarta-oro-2.0.8/src/java/examples/didNotMatch.java 0000644 0001750 0001750 00000007640 07773723336 021660 0 ustar arnaud arnaud /* * $Id: didNotMatch.java,v 1.7 2003/11/07 20:16:23 dfs Exp $ * * ==================================================================== * The Apache Software License, Version 1.1 * * Copyright (c) 2000 The Apache Software Foundation. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. The end-user documentation included with the redistribution, * if any, must include the following acknowledgment: * "This product includes software developed by the * Apache Software Foundation (http://www.apache.org/)." * Alternately, this acknowledgment may appear in the software itself, * if and wherever such third-party acknowledgments normally appear. * * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" * must not be used to endorse or promote products derived from this * software without prior written permission. For written * permission, please contact apache@apache.org. * * 5. Products derived from this software may not be called "Apache" * or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their * name, without prior written permission of the Apache Software Foundation. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * ==================================================================== * * This software consists of voluntary contributions made by many * individuals on behalf of the Apache Software Foundation. For more * information on the Apache Software Foundation, please see *