ox-2.8.2/ 0000755 0000041 0000041 00000000000 13203413064 012175 5 ustar www-data www-data ox-2.8.2/lib/ 0000755 0000041 0000041 00000000000 13203413063 012742 5 ustar www-data www-data ox-2.8.2/lib/ox.rb 0000644 0000041 0000041 00000003361 13203413063 013720 0 ustar www-data www-data # Copyright (c) 2011, Peter Ohler
# All rights reserved.
#
# === Description:
#
# Ox handles XML documents in two ways. It is a generic XML parser and writer as
# well as a fast Object / XML marshaller. Ox was written for speed as a
# replacement for Nokogiri and for Marshal.
#
# As an XML parser it is 2 or more times faster than Nokogiri and as a generic
# XML writer it is 14 times faster than Nokogiri. Of course different files may
# result in slightly different times.
#
# As an Object serializer Ox is 4 times faster than the standard Ruby
# Marshal.dump(). Ox is 3 times faster than Marshal.load().
#
# === Object Dump Sample:
#
# require 'ox'
#
# class Sample
# attr_accessor :a, :b, :c
#
# def initialize(a, b, c)
# @a = a
# @b = b
# @c = c
# end
# end
#
# # Create Object
# obj = Sample.new(1, "bee", ['x', :y, 7.0])
# # Now dump the Object to an XML String.
# xml = Ox.dump(obj)
# # Convert the object back into a Sample Object.
# obj2 = Ox.parse_obj(xml)
#
# === Generic XML Writing and Parsing:
#
# require 'ox'
#
# doc = Ox::Document.new(:version => '1.0')
#
# top = Ox::Element.new('top')
# top[:name] = 'sample'
# doc << top
#
# mid = Ox::Element.new('middle')
# mid[:name] = 'second'
# top << mid
#
# bot = Ox::Element.new('bottom')
# bot[:name] = 'third'
# mid << bot
#
# xml = Ox.dump(doc)
# puts xml
# doc2 = Ox.parse(xml)
# puts "Same? #{doc == doc2}"
module Ox
end
require 'ox/version'
require 'ox/error'
require 'ox/hasattrs'
require 'ox/node'
require 'ox/comment'
require 'ox/raw'
require 'ox/instruct'
require 'ox/cdata'
require 'ox/doctype'
require 'ox/element'
require 'ox/document'
require 'ox/bag'
require 'ox/sax'
require 'ox/ox' # C extension
ox-2.8.2/lib/ox/ 0000755 0000041 0000041 00000000000 13203413063 013370 5 ustar www-data www-data ox-2.8.2/lib/ox/instruct.rb 0000644 0000041 0000041 00000002372 13203413063 015574 0 ustar www-data www-data
module Ox
# An Instruct represents a processing instruction of an XML document. It has a target, attributes, and a value or
# content. The content will be all characters with the exception of the target. If the content follows a regular
# attribute format then the attributes will be set to the parsed values. If it does not follow the attribute formate
# then the attributes will be empty.
class Instruct < Node
include HasAttrs
# The content of the processing instruction.
attr_accessor :content
# Creates a new Instruct with the specified name.
# - +name+ [String] name of the Instruct
def initialize(name)
super
@attributes = nil
@content = nil
end
alias target value
# Returns true if this Object and other are of the same type and have the
# equivalent value and the equivalent elements otherwise false is returned.
# - +other+ [Object] Object compare _self_ to.
# *return* [Boolean] true if both Objects are equivalent, otherwise false.
def eql?(other)
return false unless super(other)
return false unless self.attributes == other.attributes
return false unless self.content == other.content
true
end
alias == eql?
end # Instruct
end # Ox
ox-2.8.2/lib/ox/cdata.rb 0000644 0000041 0000041 00000000364 13203413063 014774 0 ustar www-data www-data
module Ox
# CData represents a CDATA element in an XML document.
class CData < Node
# Creates a CDATA element.
# - +value+ [String] value for the CDATA contents
def initialize(value)
super
end
end # CData
end # Ox
ox-2.8.2/lib/ox/comment.rb 0000644 0000041 0000041 00000000470 13203413063 015360 0 ustar www-data www-data
module Ox
# Comments represent XML comments in an XML document. A comment has a value
# attribute only.
class Comment < Node
# Creates a new Comment with the specified value.
# - +value+ [String] string value for the comment
def initialize(value)
super
end
end # Comment
end # Ox
ox-2.8.2/lib/ox/sax.rb 0000644 0000041 0000041 00000005633 13203413063 014517 0 ustar www-data www-data
module Ox
# A SAX style parse handler. The Ox::Sax handler class should be subclasses
# and then used with the Ox.sax_parse() method. The Sax methods will then be
# called as the file is parsed. This is best suited for very large files or
# IO streams.
#
# *Example*
#
# require 'ox'
#
# class MySax < ::Ox::Sax
# def initialize()
# @element_names = []
# end
#
# def start_element(name)
# @element_names << name
# end
# end
#
# any = MySax.new()
# File.open('any.xml', 'r') do |f|
# Ox.sax_parse(any, f)
# end
#
# To make the desired methods active while parsing the desired method should
# be made public in the subclasses. If the methods remain private they will
# not be called during parsing. The 'name' argument in the callback methods
# will be a Symbol. The 'str' arguments will be a String. The 'value'
# arguments will be Ox::Sax::Value objects. Since both the text() and the
# value() methods are called for the same element in the XML document the the
# text() method is ignored if the value() method is defined or public. The
# same is true for attr() and attr_value(). When all attributes have been read
# the attr_done() callback will be invoked.
#
# def instruct(target); end
# def end_instruct(target); end
# def attr(name, str); end
# def attr_value(name, value); end
# def attrs_done(); end
# def doctype(str); end
# def comment(str); end
# def cdata(str); end
# def text(str); end
# def value(value); end
# def start_element(name); end
# def end_element(name); end
# def error(message, line, column); end
# def abort(name); end
#
# Initializing _line_ attribute in the initializer will cause that variable to
# be updated before each callback with the XML line number. The same is true
# for the _column_ attribute but it will be updated with the column in the XML
# file that is the start of the element or node just read. @pos if defined
# will hold the number of bytes from the start of the document.
class Sax
# Create a new instance of the Sax handler class.
def initialize()
#@pos = nil
#@line = nil
#@column = nil
end
# To make the desired methods active while parsing the desired method
# should be made public in the subclasses. If the methods remain private
# they will not be called during parsing.
private
def instruct(target)
end
def end_instruct(target)
end
def attr(name, str)
end
def attr_value(name, value)
end
def attrs_done()
end
def doctype(str)
end
def comment(str)
end
def cdata(str)
end
def text(str)
end
def value(value)
end
def start_element(name)
end
def end_element(name)
end
def error(message, line, column)
end
def abort(name)
end
end # Sax
end # Ox
ox-2.8.2/lib/ox/document.rb 0000644 0000041 0000041 00000002032 13203413063 015530 0 ustar www-data www-data
module Ox
# Represents an XML document. It has a fixed set of attributes which form
# the XML prolog. A Document includes Elements.
class Document < Element
# Create a new Document.
# - +prolog+ [Hash] prolog attributes
# - _:version_ [String] version, typically '1.0' or '1.1'
# - _:encoding_ [String] encoding for the document, currently included but ignored
# - _:standalone_ [String] indicates the document is standalone
def initialize(prolog={})
super(nil)
@attributes = { }
@attributes[:version] = prolog[:version] unless prolog[:version].nil?
@attributes[:encoding] = prolog[:encoding] unless prolog[:encoding].nil?
@attributes[:standalone] = prolog[:standalone] unless prolog[:standalone].nil?
end
# Returns the first Element in the document.
def root()
unless !instance_variable_defined?(:@nodes) || @nodes.nil?
@nodes.each do |n|
return n if n.is_a?(::Ox::Element)
end
end
nil
end
end # Document
end # Ox
ox-2.8.2/lib/ox/hasattrs.rb 0000644 0000041 0000041 00000003651 13203413063 015553 0 ustar www-data www-data
module Ox
# An Object that includes the HasAttrs module can have attributes which are a Hash of String values and either String
# or Symbol keys.
#
# To access the attributes there are several options. One is to walk the attributes. The easiest for simple regularly
# formatted XML is to reference the attributes simply by name.
module HasAttrs
# Returns all the attributes of the Instruct as a Hash.
# *return* [Hash] all attributes and attribute values.
def attributes
@attributes = { } if !instance_variable_defined?(:@attributes) or @attributes.nil?
@attributes
end
# Returns the value of an attribute.
# - +attr+ [Symbol|String] attribute name or key to return the value for
def [](attr)
return nil unless instance_variable_defined?(:@attributes) and @attributes.is_a?(Hash)
@attributes[attr] or (attr.is_a?(String) ? @attributes[attr.to_sym] : @attributes[attr.to_s])
end
# Adds or set an attribute of the Instruct.
# - +attr+ [Symbol|String] attribute name or key
# - +value+ [Object] value for the attribute
def []=(attr, value)
raise "argument to [] must be a Symbol or a String." unless attr.is_a?(Symbol) or attr.is_a?(String)
@attributes = { } if !instance_variable_defined?(:@attributes) or @attributes.nil?
@attributes[attr] = value.to_s
end
# Handles the 'easy' API that allows navigating a simple XML by
# referencing attributes by name.
# - +id+ [Symbol] element or attribute name
# *return* [String|nil] the attribute value
# _raise_ [NoMethodError] if no match is found
def method_missing(id, *args, &block)
ids = id.to_s
if instance_variable_defined?(:@attributes)
return @attributes[id] if @attributes.has_key?(id)
return @attributes[ids] if @attributes.has_key?(ids)
end
raise NoMethodError.new("#{ids} not found", name)
end
end # HasAttrs
end # Ox
ox-2.8.2/lib/ox/version.rb 0000644 0000041 0000041 00000000107 13203413063 015400 0 ustar www-data www-data
module Ox
# Current version of the module.
VERSION = '2.8.2'
end
ox-2.8.2/lib/ox/bag.rb 0000644 0000041 0000041 00000006737 13203413063 014463 0 ustar www-data www-data
module Ox
# A generic class that is used only for storing attributes. It is the base
# Class for auto-generated classes in the storage system. Instance variables
# are added using the instance_variable_set() method. All instance variables
# can be accessed using the variable name (without the @ prefix). No setters
# are provided as the Class is intended for reading only.
class Bag
# The initializer can take multiple arguments in the form of key values
# where the key is the variable name and the value is the variable
# value. This is intended for testing purposes only.
# - +args+ [Hash] instance variable symbols and their values
#
# *Example*
#
# Ox::Bag.new(:@x => 42, :@y => 57)
#
def initialize(args={ })
args.each do |k,v|
self.instance_variable_set(k, v)
end
end
# Replaces the Object.respond_to?() method.
# - +m+ [Symbol] method symbol
# *return* [Boolean] true for any method that matches an instance variable
# reader, otherwise false.
def respond_to?(m)
return true if super
at_m = ('@' + m.to_s).to_sym
instance_variables.include?(at_m)
end
# Handles requests for variable values. Others cause an Exception to be
# raised.
# - +m+ (Symbol) method symbol
# *return* [Boolean] the value of the specified instance variable.
#
# _raise_ [ArgumentError] if an argument is given. Zero arguments expected.
#
# _raise_ [NoMethodError] if the instance variable is not defined.
def method_missing(m, *args, &block)
raise ArgumentError.new("wrong number of arguments (#{args.size} for 0) to method #{m}") unless args.nil? or args.empty?
at_m = ('@' + m.to_s).to_sym
raise NoMethodError.new("undefined method #{m}", m) unless instance_variable_defined?(at_m)
instance_variable_get(at_m)
end
# Replaces eql?() with something more reasonable for this Class.
# - +other+ [Object] Object to compare self to
# *return* [Boolean] true if each variable and value are the same, otherwise false.
def eql?(other)
return false if (other.nil? or self.class != other.class)
ova = other.instance_variables
iv = instance_variables
return false if ova.size != iv.size
iv.each do |vid|
return false if instance_variable_get(vid) != other.instance_variable_get(vid)
end
true
end
alias == eql?
# Define a new class based on the Ox::Bag class. This is used internally in
# the Ox module and is available to service wrappers that receive XML
# requests that include Objects of Classes not defined in the storage
# process.
# - +classname+ (String) Class name or symbol that includes Module names.
# *return* [Object] an instance of the specified Class.
#
# _raise_ [NameError] if the classname is invalid.
def self.define_class(classname)
classname = classname.to_s unless classname.is_a?(String)
tokens = classname.split('::').map { |n| n.to_sym }
raise NameError.new("Invalid classname '#{classname}") if tokens.empty?
m = Object
tokens[0..-2].each do |sym|
if m.const_defined?(sym)
m = m.const_get(sym)
else
c = Module.new
m.const_set(sym, c)
m = c
end
end
sym = tokens[-1]
if m.const_defined?(sym)
c = m.const_get(sym)
else
c = Class.new(Ox::Bag)
m.const_set(sym, c)
end
c
end
end # Bag
end # Ox
ox-2.8.2/lib/ox/raw.rb 0000644 0000041 0000041 00000000574 13203413063 014514 0 ustar www-data www-data
module Ox
# Raw elements are used to inject existing XML strings into a document
# WARNING: Use of this feature can result in invalid XML, since `value` is
# injected as-is.
class Raw < Node
# Creates a new Raw element with the specified value.
# - +value+ [String] string value for the comment
def initialize(value)
super
end
end # Raw
end # Ox
ox-2.8.2/lib/ox/doctype.rb 0000644 0000041 0000041 00000000463 13203413063 015367 0 ustar www-data www-data
module Ox
# Represents a DOCTYPE in an XML document.
class DocType < Node
# Creates a DOCTYPE elements with the content as a string specified in the
# value parameter.
# - +value+ [String] string value for the element
def initialize(value)
super
end
end # DocType
end # Ox
ox-2.8.2/lib/ox/xmlrpc_adapter.rb 0000644 0000041 0000041 00000001661 13203413063 016726 0 ustar www-data www-data
require 'ox'
module Ox
# This is an alternative parser for the stdlib xmlrpc library. It makes
# use of Ox and is based on REXMLStreamParser. To use it set is as the
# parser for an XMLRPC client:
#
# require 'xmlrpc/client'
# require 'ox/xmlrpc_adapter'
# client = XMLRPC::Client.new2('http://some_server/rpc')
# client.set_parser(Ox::StreamParser.new)
class StreamParser < XMLRPC::XMLParser::AbstractStreamParser
# Create a new instance.
def initialize
@parser_class = OxParser
end
# The SAX wrapper.
class OxParser < Ox::Sax
include XMLRPC::XMLParser::StreamParserMixin
alias :text :character
alias :end_element :endElement
alias :start_element :startElement
# Initiates the sax parser with the provided string.
def parse(str)
Ox.sax_parse(self, StringIO.new(str), :symbolize => false, :convert_special => true)
end
end
end
end
ox-2.8.2/lib/ox/node.rb 0000644 0000041 0000041 00000001251 13203413063 014641 0 ustar www-data www-data
module Ox
# The Node is the base class for all other in the Ox module.
class Node
# String value associated with the Node.
attr_accessor :value
# Creates a new Node with the specified String value.
# - +value+ [String] string value for the Node
def initialize(value)
@value = value.to_s
end
# Returns true if this Object and other are of the same type and have the
# equivalent value otherwise false is returned.
# - +other+ [Object] Object to compare _self_ to.
def eql?(other)
return false if (other.nil? or self.class != other.class)
other.value == self.value
end
alias == eql?
end # Node
end # Ox
ox-2.8.2/lib/ox/error.rb 0000644 0000041 0000041 00000001122 13203413063 015042 0 ustar www-data www-data
module Ox
# Base error class for Ox errors.
class Error < StandardError
end # Error
# An Exception that is raised as a result of a parse error while parsing a XML document.
class ParseError < Error
end # ParseError
# An Exception that is raised as a result of an invalid argument.
class ArgError < Error
end # ArgError
# An Exception raised if a path is not valid.
class InvalidPath < Error
# Create a new instance with the +path+ specified.
def initialize(path)
super("#{path.join('/')} is not a valid location.")
end
end # InvalidPath
end # Ox
ox-2.8.2/lib/ox/element.rb 0000644 0000041 0000041 00000026460 13203413063 015356 0 ustar www-data www-data
module Ox
# An Element represents a element of an XML document. It has a name,
# attributes, and sub-nodes.
#
# To access the child elements or attributes there are several options. One
# is to walk the nodes and attributes. Another is to use the locate()
# method. The easiest for simple regularly formatted XML is to reference the
# sub elements or attributes simply by name. Repeating elements with the
# same name can be referenced with an element count as well. A few examples
# should explain the 'easy' API more clearly.
#
# *Example*
#
# doc = Ox.parse(%{
#
#
#
# Peter
# Ohler
#
#
# Makie
# Ohler
#
#
# })
#
# doc.People.Person.given.text
# => "Peter"
# doc.People.Person(1).given.text
# => "Makie"
# doc.People.Person.age
# => "58"
class Element < Node
include HasAttrs
# Creates a new Element with the specified name.
# - +name+ [String] name of the Element
def initialize(name)
super
@attributes = {}
@nodes = []
end
alias name value
# Returns the Element's nodes array. These are the sub-elements of this
# Element.
# *return* [Array] all child Nodes.
def nodes
@nodes = [] if !instance_variable_defined?(:@nodes) or @nodes.nil?
@nodes
end
# Appends a Node to the Element's nodes array. Returns the element itself
# so multiple appends can be chained together.
# - +node+ [Node] Node to append to the nodes array
def <<(node)
raise "argument to << must be a String or Ox::Node." unless node.is_a?(String) or node.is_a?(Node)
@nodes = [] if !instance_variable_defined?(:@nodes) or @nodes.nil?
@nodes << node
self
end
# Returns true if this Object and other are of the same type and have the
# equivalent value and the equivalent elements otherwise false is returned.
# - +other+ [Object] Object compare _self_ to.
# *return* [Boolean] true if both Objects are equivalent, otherwise false.
def eql?(other)
return false unless super(other)
return false unless self.attributes == other.attributes
return false unless self.nodes == other.nodes
true
end
alias == eql?
# Returns the first String in the elements nodes array or nil if there is
# no String node.
def text()
nodes.each { |n| return n if n.is_a?(String) }
nil
end
# Clears any child nodes of an element and replaces those with a single Text
# (String) node. Note the existing nodes array is modified and not replaced.
# - +txt+ [String] to become the only element of the nodes array
def replace_text(txt)
raise "the argument to replace_text() must be a String" unless txt.is_a?(String)
@nodes.clear()
@nodes << txt
end
# Return true if all the key-value pairs in the cond Hash match the
# @attributes key-values.
def attr_match(cond)
cond.each_pair { |k,v| return false unless v == @attributes[k.to_sym] || v == @attributes[k.to_s] }
true
end
# Iterate over each child of the instance yielding according to the cond
# argument value. If the cond argument is nil then all child nodes are
# yielded to. If cond is a string then only the child Elements with a
# matching name will be yielded to. If the cond is a Hash then the
# keys-value pairs in the cond must match the child attribute values with
# the same keys. Any other cond type will yield to nothing.
def each(cond=nil)
if cond.nil?
nodes.each { |n| yield(n) }
else
cond = cond.to_s if cond.is_a?(Symbol)
if cond.is_a?(String)
nodes.each { |n| yield(n) if n.is_a?(Element) && cond == n.name }
elsif cond.is_a?(Hash)
nodes.each { |n| yield(n) if n.is_a?(Element) && n.attr_match(cond) }
end
end
end
# Returns an array of Nodes or Strings that correspond to the locations
# specified by the path parameter. The path parameter describes the path
# to the return values which can be either nodes in the XML or
# attributes. The path is a relative description. There are similarities
# between the locate() method and XPath but locate does not follow the
# same rules as XPath. The syntax is meant to be simpler and more Ruby
# like.
#
# Like XPath the path delimiters are the slash (/) character. The path is
# split on the delimiter and each element of the path then describes the
# child of the current Element to traverse.
#
# Attributes are specified with an @ prefix.
#
# Each element name in the path can be followed by a bracket expression
# that narrows the paths to traverse. Supported expressions are numbers
# with a preceeding qualifier. Qualifiers are -, +, <, and >. The +
# qualifier is the default. A - qualifier indicates the index begins at
# the end of the children just like for Ruby Arrays. The < and >
# qualifiers indicates all elements either less than or greater than
# should be matched. Note that unlike XPath, the element index starts at 0
# similar to Ruby be contrary to XPath.
#
# Element names can also be wildcard characters. A * indicates any decendent should be followed. A ? indicates any
# single Element can match the wildcard. A ^ character followed by the name of a Class will match any node of the
# specified class. Valid class names are Element, Comment, String (or Text), CData, DocType.
#
# Examples are:
# * element.locate("Family/Pete/*") returns all children of the Pete Element.
# * element.locate("Family/?[1]") returns the first element in the Family Element.
# * element.locate("Family/?[<3]") returns the first 3 elements in the Family Element.
# * element.locate("Family/?[@age=32]") returns the elements with an age attribute equal to 32 in the Family Element.
# * element.locate("Family/?/@age") returns the arg attribute for each child in the Family Element.
# * element.locate("Family/*/@type") returns the type attribute value for decendents of the Family.
# * element.locate("Family/^Comment") returns any comments that are a child of Family.
#
# - +path+ [String] path to the Nodes to locate
def locate(path)
return [self] if path.nil?
found = []
pa = path.split('/')
if '*' == path[0]
# a bit of a hack but it allows self to be checked as well
e = Element.new('')
e << self
e.alocate(pa, found)
else
alocate(pa, found)
end
found
end
# Handles the 'easy' API that allows navigating a simple XML by
# referencing elements and attributes by name.
# - +id+ [Symbol] element or attribute name
# *return* [Element|Node|String|nil] the element, attribute value, or Node identifed by the name
#
# _raise_ [NoMethodError] if no match is found
def method_missing(id, *args, &block)
has_some = false
ids = id.to_s
i = args[0].to_i # will be 0 if no arg or parsing fails
nodes.each do |n|
if (n.is_a?(Element) || n.is_a?(Instruct)) && (n.value == id || n.value == ids || name_matchs?(n.value, ids))
return n if 0 == i
has_some = true
i -= 1
end
end
if instance_variable_defined?(:@attributes)
return @attributes[id] if @attributes.has_key?(id)
return @attributes[ids] if @attributes.has_key?(ids)
end
return nil if has_some
raise NoMethodError.new("#{ids} not found", name)
end
# - +id+ [String|Symbol] identifer of the attribute or method
# - +ignored+ inc_all [Boolean]
# *return* true if the element has a member that matches the provided name.
def respond_to?(id, inc_all=false)
return true if super
id_str = id.to_s
id_sym = id.to_sym
nodes.each do |n|
next if n.is_a?(String)
return true if n.value == id_str || n.value == id_sym || name_matchs?(n.value, id_str)
end
if instance_variable_defined?(:@attributes) && !@attributes.nil?
return true if @attributes.has_key?(id_str)
return true if @attributes.has_key?(id_sym)
end
false
end
# - +path+ [Array] array of steps in a path
# - +found+ [Array] matching nodes
def alocate(path, found)
step = path[0]
if step.start_with?('@') # attribute
raise InvalidPath.new(path) unless 1 == path.size
if instance_variable_defined?(:@attributes)
step = step[1..-1]
sym_step = step.to_sym
@attributes.each do |k,v|
found << v if ('?' == step or k == step or k == sym_step)
end
end
else # element name
if (i = step.index('[')).nil? # just name
name = step
qual = nil
else
name = step[0..i-1]
raise InvalidPath.new(path) unless step.end_with?(']')
i += 1
qual = step[i..i] # step[i] would be better but some rubies (jruby, ree, rbx) take that as a Fixnum.
if '0' <= qual and qual <= '9'
qual = '+'
else
i += 1
end
index = step[i..-2].to_i
end
if '?' == name or '*' == name
match = nodes
elsif '^' == name[0..0] # 1.8.7 thinks name[0] is a fixnum
case name[1..-1]
when 'Element'
match = nodes.select { |e| e.is_a?(Element) }
when 'String', 'Text'
match = nodes.select { |e| e.is_a?(String) }
when 'Comment'
match = nodes.select { |e| e.is_a?(Comment) }
when 'CData'
match = nodes.select { |e| e.is_a?(CData) }
when 'DocType'
match = nodes.select { |e| e.is_a?(DocType) }
else
#puts "*** no match on #{name}"
match = []
end
else
match = nodes.select { |e| e.is_a?(Element) and name == e.name }
end
unless qual.nil? or match.empty?
case qual
when '+'
match = index < match.size ? [match[index]] : []
when '-'
match = index <= match.size ? [match[-index]] : []
when '<'
match = 0 < index ? match[0..index - 1] : []
when '>'
match = index <= match.size ? match[index + 1..-1] : []
when '@'
k,v = step[3..-2].split('=')
match = match.select { |n| n.is_a?(Element) && (v == n.attributes[k.to_sym] || v == n.attributes[k]) }
else
raise InvalidPath.new(path)
end
end
if (1 == path.size)
match.each { |n| found << n }
elsif '*' == name
match.each { |n| n.alocate(path, found) if n.is_a?(Element) }
match.each { |n| n.alocate(path[1..-1], found) if n.is_a?(Element) }
else
match.each { |n| n.alocate(path[1..-1], found) if n.is_a?(Element) }
end
end
end
private
def name_matchs?(pat, id)
return false unless pat.length == id.length
pat.length.times { |i| return false unless '_' == id[i] || pat[i] == id[i] }
true
end
end # Element
end # Ox
ox-2.8.2/LICENSE 0000644 0000041 0000041 00000002065 13203413063 013204 0 ustar www-data www-data The MIT License (MIT)
Copyright (c) 2012 Peter Ohler
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE. ox-2.8.2/CHANGELOG.md 0000644 0000041 0000041 00000031620 13203413063 014007 0 ustar www-data www-data
## 2.8.2 - November 1, 2017
- Fixed bug with SAX parser that caused a crash with very long invalid instruction element.
- Fixed SAX parse error with double elements.
## 2.8.1 - October 27, 2017
- Avoid crash with invalid XML passed to Oj.parse_obj().
## 2.8.0 - September 22, 2017
- Added :skip_off mode to make sax callback on every none empty string even
if there are not other non-whitespace characters present.
## 2.7.0 - August 18, 2017
- Two new load modes added, :hash and :hash_no_attrs. Both load an XML
document to create a Hash populated with core Ruby objects.
- Worked around Ruby API change for RSTRUCT_LEN so Ruby 2.4.2 does not crash.
## 2.6.0 - August 9, 2017
- The Element#each() method was added to allow iteration over Element nodes conditionally.
- Element#locate() now supports a [@attr=value] specification.
- An underscore character used in the easy API is now treated as a wild card for valid XML characters that are not valid for Ruby method names.
## 2.5.0 - May 4, 2017
- Set the default for skip to be to skip white space.
- Added a :nest_ok option to SAX hints that will ignore the nested check on a
tag to accomadate non-compliant HTML.
## 2.4.13 - April 21, 2017
- Corrected Builder special character handling.
## 2.4.12 - April 11, 2017
- Fixed position in builder when encoding special characters.
## 2.4.11 - March 19, 2017
- Fixed SAX parser bug regarding upper case hints not matching.
## 2.4.10 - February 13, 2017
- Dump is now smarter about which characters to replace with &xxx; alternatives.
## 2.4.9 - January 25, 2017
- Added a SAX hint that allows comments to be treated like other elements.
## 2.4.8 - January 15, 2017
- Tolerant mode now allows case-insensitve matches on elements during
parsing. Smart mode in the SAX parser is also case insensitive.
## 2.4.7 - December 25, 2016
- After encountering a <> the SAX parser will continue parsing after reporting an error.
## 2.4.6 - November 28, 2016
- Added margin option to dump.
## 2.4.5 - September 11, 2016
- Thanks to GUI for fixing an infinite loop in Ox::Builder.
## 2.4.4 - August 9, 2016
- Builder element attributes with special characters are now encoded correctly.
- A newline at end of an XML string is now controlled by the indent value. A
value of -1 indicates no terminating newline character and an indentation of
zero.
## 2.4.3 - June 26, 2016
- Fixed compiler warnings and errors.
- Updated for Ruby 2.4.0.
## 2.4.2 - June 23, 2016
- Added methods to Ox::Builder to provide output position information.
## 2.4.1 - April 30, 2016
- Made SAX smarter a little smarter or rather let it handle unquoted string
with a / at the end.
- Fixed bug with reporting errors of element names that are too long.
- Added overlay feature to give control over which elements generate callbacks
with the SAX parser.
- Element.locate now includes self if the path is relative and starts with a wildcard.
## 2.4.0 - April 14, 2016
- Added Ox::Builder that constructs an XML string or writes XML to a stream
using builder methods.
## 2.3.0 - February 21, 2016
- Added Ox::Element.replace_text() method.
- Ox::Element nodes variable is now always initialized to an empty Array.
- Ox::Element attributes variable is now always initialized to an empty Hash.
- A invalid_replace option has been added. It will replace invalid XML
character with a provided string. Strict effort now raises an exception if an
invalid character is encountered on dump or load.
- Ox.load and Ox.parse now allow for a callback block to handle multiple top
level entities in the input.
- The Ox SAX parser now supports strings as input directly without and IO wrapper.
## 2.2.4 - February 4, 2016
- Changed the code to allow compilation on older compilers. No change in
functionality otherwise.
## 2.2.3 - December 31, 2015
- The convert_special option now applies to attributes as well as elements in
the SAX parser.
- The convert_special option now applies to the regualr parser as well as the
SAX parser.
- Updated to work correctly with Ruby 2.3.0.
## 2.2.2 - October 19, 2015
- Fixed problem with detecting invalid special character sequences.
- Fixed bug that caused a crash when an <> was encountered with the SAX parser.
## 2.2.1 - July 30, 2015
- Added support to handle script elements in html.
- Added support for position from start for the sax parser.
## 2.2.0 - April 20, 2015
- Added the SAX convert_special option to the default options.
- Added the SAX smart option to the default options.
- Other SAX options are now taken from the defaults if not specified.
## 2.1.8 - February 10, 2015
- Fixed a bug that caused all input to be read before parsing with the sax
parser and an IO.pipe.
## 2.1.7 - January 31, 2015
- Empty elements such as are now called back with empty text.
- Fixed GC problem that occurs with the new GC in Ruby 2.2 that garbage
collects Symbols.
## 2.1.6 - December 31, 2014
- Update licenses. No other changes.
## 2.1.5 - December 30, 2014
- Fixed symbol intern problem with Ruby 2.2.0. Symbols are not dynamic unless
rb_intern(). There does not seem to be a way to force symbols created with
encoding to be pinned.
## 2.1.4 - December 5, 2014
- Fixed bug where the parser always started at the first position in a stringio
instead of the current position.
## 2.1.3 - July 25, 2014
- Added check for @attributes being nil. Reported by and proposed fix by Elana.
## 2.1.2 - July 17, 2014
- Added skip option to parsing. This allows white space to be collapsed in two
different ways.
- Added respond_to? method for easy access method checking.
## 2.1.1 - February 12, 2014
- Worked around a module reset and clear that occurs on some Rubies.
## 2.1.0 - February 2, 2014
- Thanks to jfontan Ox now includes support for XMLRPC.
## 2.0.12 - December 1, 2013 - May 21, 2013
- Fixed problem compiling with latest version of Rubinius.
## 2.0.11 - October 17, 2013
- Added support for BigDecimals in :object mode.
## 2.0.10
- Small fix to not create an empty element from a closed element when using locate().
- Fixed to keep objects from being garbages collected in Ruby 2.x.
## 2.0.9 - September 2, 2013
- Fixed bug that did not allow ISO-8859-1 characters and caused a crash.
## 2.0.8 - August 6, 2013
- Allow single quoted strings in all modes.
## 2.0.7 - August 4, 2013
- Fixed DOCTYPE parsing to handle nested '>' characters.
## 2.0.6 - July 23, 2013
- Fixed bug in special character decoding that chopped of text.
- Limit depth on dump to 1000 to avoid core dump on circular references if the user does not specify circular.
- Handles dumping non-string values for attributes correctly by converting the value to a string.
## 2.0.5 - July 5, 2013
- Better support for special character encoding with 1.8.7. - February 8, 2013
## 2.0.4 - June 24, 2013
- Fixed SAX parser handling of nnnn; encoded characters.
## 2.0.3 - June 12, 2013
- Fixed excessive memory allocation issue for very large file parsing (half a gig).
## 2.0.2 - June 7, 2013
- Fixed buffer sliding window off by 1 error in the SAX parser.
## 2.0.1
- Added an attrs_done callback to the sax parser that will be called when all
attributes for an element have been read.
- Fixed bug in SAX parser where raising an exception in the handler routines
would not cleanup. The test put together by griffinmyers was a huge help.
- Reduced stack use in a several places to improve fiber support.
- Changed exception handling to assure proper cleanup with new stack minimizing.
## 2.0.0 - April 16, 2013
- The SAX parser went through a significant re-write. The options have changed. It is now 15% faster on large files and
much better at recovering from errors. So much so that the tolerant option was removed and is now the default and
only behavior. A smart option was added however. The smart option recognizes a file as an HTML file and will apply a
simple set of validation rules that allow the HTML to be parsed more reasonably. Errors will cause callbacks but the
parsing continues with the best guess as to how to recover. Rubymaniac has helped with testing and prompted the
rewrite to support parsing HTML pages.
- HTML is now supported with the SAX parser. The parser knows some tags like \ or \ do not have to be
closed. Other hints as to how to parse and when to raise errors are also included. The parser does it's best to
continue parsing even after errors.
- Added symbolize option to the sax parser. This option, if set to false will use strings instead of symbols for
element and attribute names.
- A contrib directory was added for people to submit useful bits of code that can be used with Ox. The first
contributor is Notezen with a nice way of building XML.
## 1.9.4 - March 24, 2013
- SAX tolerant mode handle multiple elements in a document better.
## 1.9.3 - March 22, 2013
- mcarpenter fixed a compile problem with Cygwin.
- Now more tolerant when the :effort is set to :tolerant. Ox will let all sorts
of errors typical in HTML documents pass. The result may not be perfect but
at least parsed results are returned.
- Attribute values need not be quoted or they can be quoted with single
quotes or there can be no =value are all.
- Elements not terminated will be terminated by the next element
termination. This effect goes up until a match is found on the element
name.
- SAX parser also given a :tolerant option with the same tolerance as the string parser.
## 1.9.2 - March 9, 2013
- Fixed bug in the sax element name check that cause a memory write error.
## 1.9.1 - February 27, 2013
- Fixed the line numbers to be the start of the elements in the sax parser.
## 1.9.0 - February 25, 2013
- Added a new feature to Ox::Element.locate() that allows filtering by node Class.
- Added feature to the Sax parser. If @line is defined in the handler it is set to the line number of the xml file
before making callbacks. The same goes for @column but it is updated with the column.
## 1.8.9 - February 21, 2013
- Fixed bug in element start and end name checking.
## 1.8.8 - February 17, 2013
- Fixed bug in check for open and close element names matching.
## 1.8.7
- Added a correct check for element open and close names.
- Changed raised Exceptions to customer classes that inherit from StandardError.
- Fixed a few minor bugs.
## 1.8.6 - February 7, 2013
- Removed broken check for matching start and end element names in SAX mode. The names are still included in the
handler callbacks so the user can perform the check is desired.
## 1.8.5 - February 3, 2013
- added encoding support for JRuby where possible when in 1.9 mode.
## 1.8.4 - January 25, 2013
- Applied patch by mcarpenter to fix solaris issues with build and remaining undefined @nodes.
## 1.8.3 - January 24, 2013
- Sax parser now honors encoding specification in the xml prolog correctly.
## 1.8.2 - January 18, 2013
- Ox::Element.locate no longer raises and exception if there are no child nodes.
- Dumping an XML document no longer puts a carriage return after processing instructions.
## 1.8.1 - December 17, 2012
- Fixed bug that caused a crash when an invalid xml with two elements and no was parsed. (issue #28)
- Modified the SAX parser to not strip white space from the start of string content.
## 1.8.0 - December 11, 2012
- Added more complete support for processing instructions in both the generic parser and in the sax parser. This change includes and additional sax handler callback for the end of the instruction processing.
## 1.7.1 - December 6, 2012
- Pulled in sharpyfox's changes to make Ox with with Windows. (issue #24)
- Fixed bug that ignored white space only text elements. (issue #26)
## 1.7.0 - November 27, 2012
- Added support for BOM in the SAX parser.
## 1.6.9 - November 25, 2012
- Added support for BOM. They are honored for and handled correctly for UTF-8. Others cause encoding issues with Ruby or raise an error as others are not ASCII compatible..
## 1.6.8 - November 18, 2012
- Changed extconf.rb to use RUBY_PLATFORM.
## 1.6.7 - November 15, 2012
- Now uses the encoding of the imput XML as the default encoding for the parsed output if the default options encoding is not set and the encoding is not set in the XML file prolog.
## 1.6.5 - October 25, 2012
- Special character handling now supports UCS-2 and UCS-4 Unicode characters as well as UTF-8 characters.
## 1.6.4 - October 24, 2012
- Special character handling has been improved. Both hex and base 10 numeric values are allowed up to a 64 bit number
for really long UTF-8 characters.
## 1.6.3 - October 22, 2012
- Fixed compatibility issues with Linux (Ubuntu) mostly related to pointer sizes.
## 1.6.2 - October 7, 2012
- Added check for Solaris and Linux builds to not use the timezone member of time struct (struct tm).
## 1.6.1 - October 7, 2012
- Added check for Solaris builds to not use the timezone member of time struct (struct tm).
ox-2.8.2/ox.gemspec 0000644 0000041 0000041 00000004523 13203413064 014174 0 ustar www-data www-data #########################################################
# This file has been automatically generated by gem2tgz #
#########################################################
# -*- encoding: utf-8 -*-
Gem::Specification.new do |s|
s.name = "ox"
s.version = "2.8.2"
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
s.authors = ["Peter Ohler"]
s.date = "2017-11-01"
s.description = "A fast XML parser and object serializer that uses only standard C lib.\n \nOptimized XML (Ox), as the name implies was written to provide speed optimized\nXML handling. It was designed to be an alternative to Nokogiri and other Ruby\nXML parsers for generic XML parsing and as an alternative to Marshal for Object\nserialization. "
s.email = "peter@ohler.com"
s.extensions = ["ext/ox/extconf.rb"]
s.extra_rdoc_files = ["CHANGELOG.md", "README.md"]
s.files = ["CHANGELOG.md", "LICENSE", "README.md", "ext/ox/attr.h", "ext/ox/base64.c", "ext/ox/base64.h", "ext/ox/buf.h", "ext/ox/builder.c", "ext/ox/cache.c", "ext/ox/cache.h", "ext/ox/cache8.c", "ext/ox/cache8.h", "ext/ox/dump.c", "ext/ox/encode.h", "ext/ox/err.c", "ext/ox/err.h", "ext/ox/extconf.rb", "ext/ox/gen_load.c", "ext/ox/hash_load.c", "ext/ox/helper.h", "ext/ox/obj_load.c", "ext/ox/ox.c", "ext/ox/ox.h", "ext/ox/parse.c", "ext/ox/sax.c", "ext/ox/sax.h", "ext/ox/sax_as.c", "ext/ox/sax_buf.c", "ext/ox/sax_buf.h", "ext/ox/sax_has.h", "ext/ox/sax_hint.c", "ext/ox/sax_hint.h", "ext/ox/sax_stack.h", "ext/ox/special.c", "ext/ox/special.h", "ext/ox/type.h", "lib/ox.rb", "lib/ox/bag.rb", "lib/ox/cdata.rb", "lib/ox/comment.rb", "lib/ox/doctype.rb", "lib/ox/document.rb", "lib/ox/element.rb", "lib/ox/error.rb", "lib/ox/hasattrs.rb", "lib/ox/instruct.rb", "lib/ox/node.rb", "lib/ox/raw.rb", "lib/ox/sax.rb", "lib/ox/version.rb", "lib/ox/xmlrpc_adapter.rb"]
s.homepage = "http://www.ohler.com/ox"
s.licenses = ["MIT"]
s.rdoc_options = ["--main", "README.md", "--title", "Ox Documentation", "--exclude", "extconf.rb"]
s.require_paths = ["ext", "lib"]
s.rubyforge_project = "ox"
s.rubygems_version = "1.8.23"
s.summary = "A fast XML parser and object serializer."
if s.respond_to? :specification_version then
s.specification_version = 4
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
else
end
else
end
end
ox-2.8.2/ext/ 0000755 0000041 0000041 00000000000 13203413063 012774 5 ustar www-data www-data ox-2.8.2/ext/ox/ 0000755 0000041 0000041 00000000000 13203413063 013422 5 ustar www-data www-data ox-2.8.2/ext/ox/cache.c 0000644 0000041 0000041 00000011135 13203413063 014632 0 ustar www-data www-data /* cache.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#include
#include
#include "cache.h"
struct _Cache {
/* The key is a length byte followed by the key as a string. If the key is longer than 254 characters then the
length is 255. The key can be for a premature value and in that case the length byte is greater than the length
of the key. */
char *key;
VALUE value;
struct _Cache *slots[16];
};
static void slot_print(Cache cache, unsigned int depth);
static char* form_key(const char *s) {
size_t len = strlen(s);
char *d = ALLOC_N(char, len + 2);
*(uint8_t*)d = (255 <= len) ? 255 : len;
memcpy(d + 1, s, len + 1);
return d;
}
void
ox_cache_new(Cache *cache) {
*cache = ALLOC(struct _Cache);
(*cache)->key = 0;
(*cache)->value = Qundef;
memset((*cache)->slots, 0, sizeof((*cache)->slots));
}
VALUE
ox_cache_get(Cache cache, const char *key, VALUE **slot, const char **keyp) {
unsigned char *k = (unsigned char*)key;
Cache *cp;
for (; '\0' != *k; k++) {
cp = cache->slots + (unsigned int)(*k >> 4); /* upper 4 bits */
if (0 == *cp) {
ox_cache_new(cp);
}
cache = *cp;
cp = cache->slots + (unsigned int)(*k & 0x0F); /* lower 4 bits */
if (0 == *cp) { /* nothing on this tree so set key and value as a premature key/value pair */
ox_cache_new(cp);
cache = *cp;
cache->key = form_key(key);
break;
} else {
int depth = (int)(k - (unsigned char*)key + 1);
cache = *cp;
if ('\0' == *(k + 1)) { /* exact match */
if (0 == cache->key) { /* nothing in this spot so take it */
cache->key = form_key(key);
break;
} else if ((depth == *cache->key || 255 < depth) && 0 == strcmp(key, cache->key + 1)) { /* match */
break;
} else { /* have to move the current premature key/value deeper */
unsigned char *ck = (unsigned char*)(cache->key + depth + 1);
Cache orig = *cp;
cp = (*cp)->slots + (*ck >> 4);
ox_cache_new(cp);
cp = (*cp)->slots + (*ck & 0x0F);
ox_cache_new(cp);
(*cp)->key = cache->key;
(*cp)->value = cache->value;
orig->key = form_key(key);
orig->value = Qundef;
}
} else { /* not exact match but on the path */
if (0 != cache->key) { /* there is a key/value here already */
if (depth == *cache->key || (255 <= depth && 0 == strncmp(cache->key, key, depth) && '\0' == cache->key[depth])) { /* key belongs here */
continue;
} else {
unsigned char *ck = (unsigned char*)(cache->key + depth + 1);
Cache orig = *cp;
cp = (*cp)->slots + (*ck >> 4);
ox_cache_new(cp);
cp = (*cp)->slots + (*ck & 0x0F);
ox_cache_new(cp);
(*cp)->key = cache->key;
(*cp)->value = cache->value;
orig->key = 0;
orig->value = Qundef;
}
}
}
}
}
*slot = &cache->value;
if (0 != keyp) {
if (0 == cache->key) {
printf("*** Error: failed to set the key for '%s'\n", key);
*keyp = 0;
} else {
*keyp = cache->key + 1;
}
}
return cache->value;
}
void
ox_cache_print(Cache cache) {
/*printf("-------------------------------------------\n");*/
slot_print(cache, 0);
}
static void
slot_print(Cache c, unsigned int depth) {
char indent[256];
Cache *cp;
unsigned int i;
if (sizeof(indent) - 1 < depth) {
depth = ((int)sizeof(indent) - 1);
}
memset(indent, ' ', depth);
indent[depth] = '\0';
for (i = 0, cp = c->slots; i < 16; i++, cp++) {
if (0 == *cp) {
/*printf("%s%02u:\n", indent, i);*/
} else {
if (0 == (*cp)->key && Qundef == (*cp)->value) {
printf("%s%02u:\n", indent, i);
} else {
const char *vs;
const char *clas;
if (Qundef == (*cp)->value) {
vs = "undefined";
clas = "";
} else {
VALUE rs = rb_funcall2((*cp)->value, rb_intern("to_s"), 0, 0);
vs = StringValuePtr(rs);
clas = rb_class2name(rb_obj_class((*cp)->value));
}
printf("%s%02u: %s = %s (%s)\n", indent, i, (*cp)->key, vs, clas);
}
slot_print(*cp, depth + 2);
}
}
}
ox-2.8.2/ext/ox/sax.h 0000644 0000041 0000041 00000002351 13203413063 014367 0 ustar www-data www-data /* sax.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_SAX_H__
#define __OX_SAX_H__
#include
#include "sax_buf.h"
#include "sax_has.h"
#include "sax_stack.h"
#include "sax_hint.h"
#include "ox.h"
typedef struct _SaxOptions {
int symbolize;
int convert_special;
int smart;
SkipMode skip;
char strip_ns[64];
Hints hints;
} *SaxOptions;
typedef struct _SaxDrive {
struct _Buf buf;
struct _NStack stack; /* element name stack */
VALUE handler;
VALUE value_obj;
struct _SaxOptions options;
int err;
int blocked;
bool abort;
struct _Has has;
#if HAS_ENCODING_SUPPORT
rb_encoding *encoding;
#elif HAS_PRIVATE_ENCODING
VALUE encoding;
#else
const char *encoding;
#endif
} *SaxDrive;
extern void ox_collapse_return(char *str);
extern void ox_sax_parse(VALUE handler, VALUE io, SaxOptions options);
extern void ox_sax_drive_cleanup(SaxDrive dr);
extern void ox_sax_drive_error(SaxDrive dr, const char *msg);
extern int ox_sax_collapse_special(SaxDrive dr, char *str, int pos, int line, int col);
extern VALUE ox_sax_value_class;
extern VALUE str2sym(SaxDrive dr, const char *str, const char **strp);
#endif /* __OX_SAX_H__ */
ox-2.8.2/ext/ox/obj_load.c 0000644 0000041 0000041 00000054512 13203413063 015346 0 ustar www-data www-data /* obj_load.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#include
#include "ruby.h"
#include "base64.h"
#include "ox.h"
static void instruct(PInfo pi, const char *target, Attr attrs, const char *content);
static void add_text(PInfo pi, char *text, int closed);
static void add_element(PInfo pi, const char *ename, Attr attrs, int hasChildren);
static void end_element(PInfo pi, const char *ename);
static VALUE parse_time(const char *text, VALUE clas);
static VALUE parse_xsd_time(const char *text, VALUE clas);
static VALUE parse_double_time(const char *text, VALUE clas);
static VALUE parse_regexp(const char *text);
static VALUE get_var_sym_from_attrs(Attr a, void *encoding);
static VALUE get_obj_from_attrs(Attr a, PInfo pi, VALUE base_class);
static VALUE get_class_from_attrs(Attr a, PInfo pi, VALUE base_class);
static VALUE classname2class(const char *name, PInfo pi, VALUE base_class);
static unsigned long get_id_from_attrs(PInfo pi, Attr a);
static CircArray circ_array_new(void);
static void circ_array_free(CircArray ca);
static void circ_array_set(CircArray ca, VALUE obj, unsigned long id);
static VALUE circ_array_get(CircArray ca, unsigned long id);
static void debug_stack(PInfo pi, const char *comment);
static void fill_indent(PInfo pi, char *buf, size_t size);
struct _ParseCallbacks _ox_obj_callbacks = {
instruct, /* instruct, */
0, /* add_doctype, */
0, /* add_comment, */
0, /* add_cdata, */
add_text,
add_element,
end_element,
NULL,
};
ParseCallbacks ox_obj_callbacks = &_ox_obj_callbacks;
extern ParseCallbacks ox_gen_callbacks;
inline static VALUE
str2sym(const char *str, void *encoding) {
VALUE sym;
#ifdef HAVE_RUBY_ENCODING_H
if (0 != encoding) {
VALUE rstr = rb_str_new2(str);
rb_enc_associate(rstr, (rb_encoding*)encoding);
sym = rb_funcall(rstr, ox_to_sym_id, 0);
} else {
sym = ID2SYM(rb_intern(str));
}
#else
sym = ID2SYM(rb_intern(str));
#endif
return sym;
}
inline static ID
name2var(const char *name, void *encoding) {
VALUE *slot;
ID var_id;
if ('0' <= *name && *name <= '9') {
var_id = INT2NUM(atoi(name));
} else if (Qundef == (var_id = ox_cache_get(ox_attr_cache, name, &slot, 0))) {
#ifdef HAVE_RUBY_ENCODING_H
if (0 != encoding) {
volatile VALUE rstr = rb_str_new2(name);
volatile VALUE sym;
rb_enc_associate(rstr, (rb_encoding*)encoding);
sym = rb_funcall(rstr, ox_to_sym_id, 0);
// Needed for Ruby 2.2 to get around the GC of symbols
// created with to_sym which is needed for encoded symbols.
rb_ary_push(ox_sym_bank, sym);
var_id = SYM2ID(sym);
} else {
var_id = rb_intern(name);
}
#else
var_id = rb_intern(name);
#endif
*slot = var_id;
}
return var_id;
}
inline static VALUE
resolve_classname(VALUE mod, const char *class_name, Effort effort, VALUE base_class) {
VALUE clas;
ID ci = rb_intern(class_name);
switch (effort) {
case TolerantEffort:
if (rb_const_defined_at(mod, ci)) {
clas = rb_const_get_at(mod, ci);
} else {
clas = Qundef;
}
break;
case AutoEffort:
if (rb_const_defined_at(mod, ci)) {
clas = rb_const_get_at(mod, ci);
} else {
clas = rb_define_class_under(mod, class_name, base_class);
}
break;
case StrictEffort:
default:
/* raise an error if name is not defined */
clas = rb_const_get_at(mod, ci);
break;
}
return clas;
}
inline static VALUE
classname2obj(const char *name, PInfo pi, VALUE base_class) {
VALUE clas = classname2class(name, pi, base_class);
if (Qundef == clas) {
return Qnil;
} else {
return rb_obj_alloc(clas);
}
}
#if HAS_RSTRUCT
inline static VALUE
structname2obj(const char *name) {
VALUE ost;
const char *s = name;
for (; 1; s++) {
if ('\0' == *s) {
s = name;
break;
} else if (':' == *s) {
s += 2;
break;
}
}
ost = rb_const_get(ox_struct_class, rb_intern(s));
/* use encoding as the indicator for Ruby 1.8.7 or 1.9.x */
#if HAS_ENCODING_SUPPORT
return rb_struct_alloc_noinit(ost);
#elif HAS_PRIVATE_ENCODING
return rb_struct_alloc_noinit(ost);
#else
return rb_struct_new(ost);
#endif
}
#endif
inline static VALUE
parse_ulong(const char *s, PInfo pi) {
unsigned long n = 0;
for (; '\0' != *s; s++) {
if ('0' <= *s && *s <= '9') {
n = n * 10 + (*s - '0');
} else {
set_error(&pi->err, "Invalid number for a julian day", pi->str, pi->s);
return Qundef;
}
}
return ULONG2NUM(n);
}
/* 2010-07-09T10:47:45.895826162+09:00 */
inline static VALUE
parse_time(const char *text, VALUE clas) {
VALUE t;
if (Qnil == (t = parse_double_time(text, clas)) &&
Qnil == (t = parse_xsd_time(text, clas))) {
VALUE args[1];
/*printf("**** time parse\n"); */
*args = rb_str_new2(text);
t = rb_funcall2(ox_time_class, ox_parse_id, 1, args);
}
return t;
}
static VALUE
classname2class(const char *name, PInfo pi, VALUE base_class) {
VALUE *slot;
VALUE clas;
if (Qundef == (clas = ox_cache_get(ox_class_cache, name, &slot, 0))) {
char class_name[1024];
char *s;
const char *n = name;
clas = rb_cObject;
for (s = class_name; '\0' != *n; n++) {
if (':' == *n) {
*s = '\0';
n++;
if (':' != *n) {
set_error(&pi->err, "Invalid classname, expected another ':'", pi->str, pi->s);
return Qundef;
}
if (Qundef == (clas = resolve_classname(clas, class_name, pi->options->effort, base_class))) {
return Qundef;
}
s = class_name;
} else {
*s++ = *n;
}
}
*s = '\0';
if (Qundef != (clas = resolve_classname(clas, class_name, pi->options->effort, base_class))) {
*slot = clas;
}
}
return clas;
}
static VALUE
get_var_sym_from_attrs(Attr a, void *encoding) {
for (; 0 != a->name; a++) {
if ('a' == *a->name && '\0' == *(a->name + 1)) {
return name2var(a->value, encoding);
}
}
return Qundef;
}
static VALUE
get_obj_from_attrs(Attr a, PInfo pi, VALUE base_class) {
for (; 0 != a->name; a++) {
if ('c' == *a->name && '\0' == *(a->name + 1)) {
return classname2obj(a->value, pi, base_class);
}
}
return Qundef;
}
#if HAS_RSTRUCT
static VALUE
get_struct_from_attrs(Attr a) {
for (; 0 != a->name; a++) {
if ('c' == *a->name && '\0' == *(a->name + 1)) {
return structname2obj(a->value);
}
}
return Qundef;
}
#endif
static VALUE
get_class_from_attrs(Attr a, PInfo pi, VALUE base_class) {
for (; 0 != a->name; a++) {
if ('c' == *a->name && '\0' == *(a->name + 1)) {
return classname2class(a->value, pi, base_class);
}
}
return Qundef;
}
static unsigned long
get_id_from_attrs(PInfo pi, Attr a) {
for (; 0 != a->name; a++) {
if ('i' == *a->name && '\0' == *(a->name + 1)) {
unsigned long id = 0;
const char *text = a->value;
char c;
for (; '\0' != *text; text++) {
c = *text;
if ('0' <= c && c <= '9') {
id = id * 10 + (c - '0');
} else {
set_error(&pi->err, "bad number format", pi->str, pi->s);
return 0;
}
}
return id;
}
}
return 0;
}
static CircArray
circ_array_new() {
CircArray ca;
ca = ALLOC(struct _CircArray);
ca->objs = ca->obj_array;
ca->size = sizeof(ca->obj_array) / sizeof(VALUE);
ca->cnt = 0;
return ca;
}
static void
circ_array_free(CircArray ca) {
if (ca->objs != ca->obj_array) {
xfree(ca->objs);
}
xfree(ca);
}
static void
circ_array_set(CircArray ca, VALUE obj, unsigned long id) {
if (0 < id) {
unsigned long i;
if (ca->size < id) {
unsigned long cnt = id + 512;
if (ca->objs == ca->obj_array) {
ca->objs = ALLOC_N(VALUE, cnt);
memcpy(ca->objs, ca->obj_array, sizeof(VALUE) * ca->cnt);
} else {
REALLOC_N(ca->objs, VALUE, cnt);
}
ca->size = cnt;
}
id--;
for (i = ca->cnt; i < id; i++) {
ca->objs[i] = Qundef;
}
ca->objs[id] = obj;
if (ca->cnt <= id) {
ca->cnt = id + 1;
}
}
}
static VALUE
circ_array_get(CircArray ca, unsigned long id) {
VALUE obj = Qundef;
if (id <= ca->cnt) {
obj = ca->objs[id - 1];
}
return obj;
}
static VALUE
parse_regexp(const char *text) {
const char *te;
int options = 0;
te = text + strlen(text) - 1;
#if HAS_ONIG
for (; text < te && '/' != *te; te--) {
switch (*te) {
case 'i': options |= ONIG_OPTION_IGNORECASE; break;
case 'm': options |= ONIG_OPTION_MULTILINE; break;
case 'x': options |= ONIG_OPTION_EXTEND; break;
default: break;
}
}
#endif
return rb_reg_new(text + 1, te - text - 1, options);
}
static void
instruct(PInfo pi, const char *target, Attr attrs, const char *content) {
if (0 == strcmp("xml", target)) {
#if HAS_ENCODING_SUPPORT
for (; 0 != attrs->name; attrs++) {
if (0 == strcmp("encoding", attrs->name)) {
pi->options->rb_enc = rb_enc_find(attrs->value);
}
}
#elif HAS_PRIVATE_ENCODING
for (; 0 != attrs->name; attrs++) {
if (0 == strcmp("encoding", attrs->name)) {
pi->options->rb_enc = rb_str_new2(attrs->value);
}
}
#endif
}
}
static void
add_text(PInfo pi, char *text, int closed) {
Helper h = helper_stack_peek(&pi->helpers);
if (!closed) {
set_error(&pi->err, "Text not closed", pi->str, pi->s);
return;
}
if (0 == h) {
set_error(&pi->err, "Unexpected text", pi->str, pi->s);
return;
}
if (DEBUG <= pi->options->trace) {
char indent[128];
fill_indent(pi, indent, sizeof(indent));
printf("%s '%s' to type %c\n", indent, text, h->type);
}
switch (h->type) {
case NoCode:
case StringCode:
h->obj = rb_str_new2(text);
#if HAS_ENCODING_SUPPORT
if (0 != pi->options->rb_enc) {
rb_enc_associate(h->obj, pi->options->rb_enc);
}
#elif HAS_PRIVATE_ENCODING
if (Qnil != pi->options->rb_enc) {
rb_funcall(h->obj, ox_force_encoding_id, 1, pi->options->rb_enc);
}
#endif
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, h->obj, (unsigned long)pi->id);
}
break;
case FixnumCode:
{
long n = 0;
char c;
int neg = 0;
if ('-' == *text) {
neg = 1;
text++;
}
for (; '\0' != *text; text++) {
c = *text;
if ('0' <= c && c <= '9') {
n = n * 10 + (c - '0');
} else {
set_error(&pi->err, "bad number format", pi->str, pi->s);
return;
}
}
if (neg) {
n = -n;
}
h->obj = LONG2NUM(n);
break;
}
case FloatCode:
h->obj = rb_float_new(strtod(text, 0));
break;
case SymbolCode:
{
VALUE sym;
VALUE *slot;
if (Qundef == (sym = ox_cache_get(ox_symbol_cache, text, &slot, 0))) {
sym = str2sym(text, (void*)pi->options->rb_enc);
// Needed for Ruby 2.2 to get around the GC of symbols created with
// to_sym which is needed for encoded symbols.
rb_ary_push(ox_sym_bank, sym);
*slot = sym;
}
h->obj = sym;
break;
}
case DateCode:
{
VALUE args[1];
if (Qundef == (*args = parse_ulong(text, pi))) {
return;
}
h->obj = rb_funcall2(ox_date_class, ox_jd_id, 1, args);
break;
}
case TimeCode:
h->obj = parse_time(text, ox_time_class);
break;
case String64Code:
{
unsigned long str_size = b64_orig_size(text);
VALUE v;
char *str = ALLOCA_N(char, str_size + 1);
from_base64(text, (uchar*)str);
v = rb_str_new(str, str_size);
#if HAS_ENCODING_SUPPORT
if (0 != pi->options->rb_enc) {
rb_enc_associate(v, pi->options->rb_enc);
}
#elif HAS_PRIVATE_ENCODING
if (0 != pi->options->rb_enc) {
rb_funcall(v, ox_force_encoding_id, 1, pi->options->rb_enc);
}
#endif
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, v, (unsigned long)h->obj);
}
h->obj = v;
break;
}
case Symbol64Code:
{
VALUE sym;
VALUE *slot;
unsigned long str_size = b64_orig_size(text);
char *str = ALLOCA_N(char, str_size + 1);
from_base64(text, (uchar*)str);
if (Qundef == (sym = ox_cache_get(ox_symbol_cache, str, &slot, 0))) {
sym = str2sym(str, (void*)pi->options->rb_enc);
// Needed for Ruby 2.2 to get around the GC of symbols created with
// to_sym which is needed for encoded symbols.
rb_ary_push(ox_sym_bank, sym);
*slot = sym;
}
h->obj = sym;
break;
}
case RegexpCode:
if ('/' == *text) {
h->obj = parse_regexp(text);
} else {
unsigned long str_size = b64_orig_size(text);
char *str = ALLOCA_N(char, str_size + 1);
from_base64(text, (uchar*)str);
h->obj = parse_regexp(str);
}
break;
case BignumCode:
h->obj = rb_cstr_to_inum(text, 10, 1);
break;
case BigDecimalCode:
#if HAS_BIGDECIMAL
h->obj = rb_funcall(ox_bigdecimal_class, ox_new_id, 1, rb_str_new2(text));
#else
h->obj = Qnil;
#endif
break;
default:
h->obj = Qnil;
break;
}
}
static void
add_element(PInfo pi, const char *ename, Attr attrs, int hasChildren) {
Attr a;
Helper h;
unsigned long id;
if (TRACE <= pi->options->trace) {
char buf[1024];
char indent[128];
char *s = buf;
char *end = buf + sizeof(buf) - 2;
s += snprintf(s, end - s, " <%s%s", (hasChildren) ? "" : "/", ename);
for (a = attrs; 0 != a->name; a++) {
s += snprintf(s, end - s, " %s=%s", a->name, a->value);
}
*s++ = '>';
*s++ = '\0';
if (DEBUG <= pi->options->trace) {
printf("===== add element stack(%d) =====\n", helper_stack_depth(&pi->helpers));
debug_stack(pi, buf);
} else {
fill_indent(pi, indent, sizeof(indent));
printf("%s%s\n", indent, buf);
}
}
if (helper_stack_empty(&pi->helpers)) { /* top level object */
if (0 != (id = get_id_from_attrs(pi, attrs))) {
pi->circ_array = circ_array_new();
}
}
if ('\0' != ename[1]) {
set_error(&pi->err, "Invalid element name", pi->str, pi->s);
return;
}
h = helper_stack_push(&pi->helpers, get_var_sym_from_attrs(attrs, (void*)pi->options->rb_enc), Qundef, *ename);
switch (h->type) {
case NilClassCode:
h->obj = Qnil;
break;
case TrueClassCode:
h->obj = Qtrue;
break;
case FalseClassCode:
h->obj = Qfalse;
break;
case StringCode:
/* h->obj will be replaced by add_text if it is called */
h->obj = ox_empty_string;
if (0 != pi->circ_array) {
pi->id = get_id_from_attrs(pi, attrs);
circ_array_set(pi->circ_array, h->obj, pi->id);
}
break;
case FixnumCode:
case FloatCode:
case SymbolCode:
case Symbol64Code:
case RegexpCode:
case BignumCode:
case BigDecimalCode:
case ComplexCode:
case DateCode:
case TimeCode:
case RationalCode: /* sub elements read next */
/* value will be read in the following add_text */
h->obj = Qundef;
break;
case String64Code:
h->obj = Qundef;
if (0 != pi->circ_array) {
pi->id = get_id_from_attrs(pi, attrs);
}
break;
case ArrayCode:
h->obj = rb_ary_new();
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
break;
case HashCode:
h->obj = rb_hash_new();
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
break;
case RangeCode:
h->obj = rb_range_new(ox_zero_fixnum, ox_zero_fixnum, Qfalse);
break;
case RawCode:
if (hasChildren) {
h->obj = ox_parse(pi->s, ox_gen_callbacks, &pi->s, pi->options, &pi->err);
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
} else {
h->obj = Qnil;
}
break;
case ExceptionCode:
if (Qundef == (h->obj = get_obj_from_attrs(attrs, pi, rb_eException))) {
return;
}
if (0 != pi->circ_array && Qnil != h->obj) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
break;
case ObjectCode:
if (Qundef == (h->obj = get_obj_from_attrs(attrs, pi, ox_bag_clas))) {
return;
}
if (0 != pi->circ_array && Qnil != h->obj) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
break;
case StructCode:
#if HAS_RSTRUCT
h->obj = get_struct_from_attrs(attrs);
if (0 != pi->circ_array) {
circ_array_set(pi->circ_array, h->obj, get_id_from_attrs(pi, attrs));
}
#else
set_error(&pi->err, "Ruby structs not supported with this verion of Ruby", pi->str, pi->s);
return;
#endif
break;
case ClassCode:
if (Qundef == (h->obj = get_class_from_attrs(attrs, pi, ox_bag_clas))) {
return;
}
break;
case RefCode:
h->obj = Qundef;
if (0 != pi->circ_array) {
h->obj = circ_array_get(pi->circ_array, get_id_from_attrs(pi, attrs));
}
if (Qundef == h->obj) {
set_error(&pi->err, "Invalid circular reference", pi->str, pi->s);
return;
}
break;
default:
set_error(&pi->err, "Invalid element name", pi->str, pi->s);
return;
break;
}
if (DEBUG <= pi->options->trace) {
debug_stack(pi, " -----------");
}
}
static void
end_element(PInfo pi, const char *ename) {
if (TRACE <= pi->options->trace) {
char indent[128];
if (DEBUG <= pi->options->trace) {
char buf[1024];
printf("===== end element stack(%d) =====\n", helper_stack_depth(&pi->helpers));
snprintf(buf, sizeof(buf) - 1, "%s>", ename);
debug_stack(pi, buf);
} else {
fill_indent(pi, indent, sizeof(indent));
printf("%s%s>\n", indent, ename);
}
}
if (!helper_stack_empty(&pi->helpers)) {
Helper h = helper_stack_pop(&pi->helpers);
Helper ph = helper_stack_peek(&pi->helpers);
if (ox_empty_string == h->obj) {
/* special catch for empty strings */
h->obj = rb_str_new2("");
}
pi->obj = h->obj;
if (0 != ph) {
switch (ph->type) {
case ArrayCode:
rb_ary_push(ph->obj, h->obj);
break;
case ExceptionCode:
case ObjectCode:
if (Qnil != ph->obj) {
rb_ivar_set(ph->obj, h->var, h->obj);
}
break;
case StructCode:
#if HAS_RSTRUCT
rb_struct_aset(ph->obj, h->var, h->obj);
#else
set_error(&pi->err, "Ruby structs not supported with this verion of Ruby", pi->str, pi->s);
return;
#endif
break;
case HashCode:
// put back h
helper_stack_push(&pi->helpers, h->var, h->obj, KeyCode);
break;
case RangeCode:
#if HAS_RSTRUCT
if (ox_beg_id == h->var) {
RSTRUCT_SET(ph->obj, 0, h->obj);
} else if (ox_end_id == h->var) {
RSTRUCT_SET(ph->obj, 1, h->obj);
} else if (ox_excl_id == h->var) {
RSTRUCT_SET(ph->obj, 2, h->obj);
} else {
set_error(&pi->err, "Invalid range attribute", pi->str, pi->s);
return;
}
#else
set_error(&pi->err, "Ruby structs not supported with this verion of Ruby", pi->str, pi->s);
return;
#endif
break;
case KeyCode:
{
Helper gh;
helper_stack_pop(&pi->helpers);
if (NULL == (gh = helper_stack_peek(&pi->helpers))) {
set_error(&pi->err, "Corrupt parse stack, container is wrong type", pi->str, pi->s);
return;
}
rb_hash_aset(gh->obj, ph->obj, h->obj);
}
break;
case ComplexCode:
#ifdef T_COMPLEX
if (Qundef == ph->obj) {
ph->obj = h->obj;
} else {
ph->obj = rb_complex_new(ph->obj, h->obj);
}
#else
set_error(&pi->err, "Complex Objects not implemented in Ruby 1.8.7", pi->str, pi->s);
return;
#endif
break;
case RationalCode:
#ifdef T_RATIONAL
if (Qundef == ph->obj) {
ph->obj = h->obj;
} else {
#ifdef RUBINIUS_RUBY
ph->obj = rb_Rational(ph->obj, h->obj);
#else
ph->obj = rb_rational_new(ph->obj, h->obj);
#endif
}
#else
set_error(&pi->err, "Rational Objects not implemented in Ruby 1.8.7", pi->str, pi->s);
return;
#endif
break;
default:
set_error(&pi->err, "Corrupt parse stack, container is wrong type", pi->str, pi->s);
return;
break;
}
}
}
if (0 != pi->circ_array && helper_stack_empty(&pi->helpers)) {
circ_array_free(pi->circ_array);
pi->circ_array = 0;
}
if (DEBUG <= pi->options->trace) {
debug_stack(pi, " ----------");
}
}
static VALUE
parse_double_time(const char *text, VALUE clas) {
long v = 0;
long v2 = 0;
const char *dot = 0;
char c;
for (; '.' != *text; text++) {
c = *text;
if (c < '0' || '9' < c) {
return Qnil;
}
v = 10 * v + (long)(c - '0');
}
dot = text++;
for (; '\0' != *text && text - dot <= 6; text++) {
c = *text;
if (c < '0' || '9' < c) {
return Qnil;
}
v2 = 10 * v2 + (long)(c - '0');
}
for (; text - dot <= 9; text++) {
v2 *= 10;
}
#if HAS_NANO_TIME
return rb_time_nano_new(v, v2);
#else
return rb_time_new(v, v2 / 1000);
#endif
}
typedef struct _Tp {
int cnt;
char end;
char alt;
} *Tp;
static VALUE
parse_xsd_time(const char *text, VALUE clas) {
long cargs[10];
long *cp = cargs;
long v;
int i;
char c;
struct _Tp tpa[10] = { { 4, '-', '-' },
{ 2, '-', '-' },
{ 2, 'T', 'T' },
{ 2, ':', ':' },
{ 2, ':', ':' },
{ 2, '.', '.' },
{ 9, '+', '-' },
{ 2, ':', ':' },
{ 2, '\0', '\0' },
{ 0, '\0', '\0' } };
Tp tp = tpa;
struct tm tm;
for (; 0 != tp->cnt; tp++) {
for (i = tp->cnt, v = 0; 0 < i ; text++, i--) {
c = *text;
if (c < '0' || '9' < c) {
if (tp->end == c || tp->alt == c) {
break;
}
return Qnil;
}
v = 10 * v + (long)(c - '0');
}
c = *text++;
if (tp->end != c && tp->alt != c) {
return Qnil;
}
*cp++ = v;
}
tm.tm_year = (int)cargs[0] - 1900;
tm.tm_mon = (int)cargs[1] - 1;
tm.tm_mday = (int)cargs[2];
tm.tm_hour = (int)cargs[3];
tm.tm_min = (int)cargs[4];
tm.tm_sec = (int)cargs[5];
#if HAS_NANO_TIME
return rb_time_nano_new(mktime(&tm), cargs[6]);
#else
return rb_time_new(mktime(&tm), cargs[6] / 1000);
#endif
}
/* debug functions */
static void
fill_indent(PInfo pi, char *buf, size_t size) {
size_t cnt;
if (0 < (cnt = helper_stack_depth(&pi->helpers))) {
cnt *= 2;
if (size < cnt + 1) {
cnt = size - 1;
}
memset(buf, ' ', cnt);
buf += cnt;
}
*buf = '\0';
}
static void
debug_stack(PInfo pi, const char *comment) {
char indent[128];
Helper h;
fill_indent(pi, indent, sizeof(indent));
printf("%s%s\n", indent, comment);
if (!helper_stack_empty(&pi->helpers)) {
for (h = pi->helpers.head; h < pi->helpers.tail; h++) {
const char *clas = "---";
const char *key = "---";
if (Qundef != h->obj) {
VALUE c = rb_obj_class(h->obj);
clas = rb_class2name(c);
}
if (Qundef != h->var) {
if (HashCode == h->type) {
VALUE v;
v = rb_funcall2(h->var, rb_intern("to_s"), 0, 0);
key = StringValuePtr(v);
} else if (ObjectCode == (h - 1)->type || ExceptionCode == (h - 1)->type || RangeCode == (h - 1)->type || StructCode == (h - 1)->type) {
key = rb_id2name(h->var);
} else {
printf("%s*** corrupt stack ***\n", indent);
}
}
printf("%s [%c] %s : %s\n", indent, h->type, clas, key);
}
}
}
ox-2.8.2/ext/ox/sax_hint.c 0000644 0000041 0000041 00000022376 13203413063 015415 0 ustar www-data www-data /* hint.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include "sax_hint.h"
static const char *audio_video_0[] = { "audio", "video", 0 };
static const char *colgroup_0[] = { "colgroup", 0 };
static const char *details_0[] = { "details", 0 };
static const char *dl_0[] = { "dl", 0 };
static const char *dt_th_0[] = { "dt", "th", 0 };
static const char *fieldset_0[] = { "fieldset", 0 };
static const char *figure_0[] = { "figure", 0 };
static const char *frameset_0[] = { "frameset", 0 };
static const char *head_0[] = { "head", 0 };
static const char *html_0[] = { "html", 0 };
static const char *map_0[] = { "map", 0 };
static const char *ol_ul_menu_0[] = { "ol", "ul", "menu", 0 };
static const char *optgroup_select_datalist_0[] = { "optgroup", "select", "datalist", 0 };
static const char *ruby_0[] = { "ruby", 0 };
static const char *table_0[] = { "table", 0 };
static const char *tr_0[] = { "tr", 0 };
static struct _Hint html_hint_array[] = {
{ "!--", false, false, false, ActiveOverlay, NULL }, // comment
{ "a", false, false, false, ActiveOverlay, NULL },
{ "abbr", false, false, false, ActiveOverlay, NULL },
{ "acronym", false, false, false, ActiveOverlay, NULL },
{ "address", false, false, false, ActiveOverlay, NULL },
{ "applet", false, false, false, ActiveOverlay, NULL },
{ "area", true, false, false, ActiveOverlay, map_0 },
{ "article", false, false, false, ActiveOverlay, NULL },
{ "aside", false, false, false, ActiveOverlay, NULL },
{ "audio", false, false, false, ActiveOverlay, NULL },
{ "b", false, false, false, ActiveOverlay, NULL },
{ "base", true, false, false, ActiveOverlay, head_0 },
{ "basefont", true, false, false, ActiveOverlay, head_0 },
{ "bdi", false, false, false, ActiveOverlay, NULL },
{ "bdo", false, true, false, ActiveOverlay, NULL },
{ "big", false, false, false, ActiveOverlay, NULL },
{ "blockquote", false, false, false, ActiveOverlay, NULL },
{ "body", false, false, false, ActiveOverlay, html_0 },
{ "br", true, false, false, ActiveOverlay, NULL },
{ "button", false, false, false, ActiveOverlay, NULL },
{ "canvas", false, false, false, ActiveOverlay, NULL },
{ "caption", false, false, false, ActiveOverlay, table_0 },
{ "center", false, false, false, ActiveOverlay, NULL },
{ "cite", false, false, false, ActiveOverlay, NULL },
{ "code", false, false, false, ActiveOverlay, NULL },
{ "col", true, false, false, ActiveOverlay, colgroup_0 },
{ "colgroup", false, false, false, ActiveOverlay, NULL },
{ "command", true, false, false, ActiveOverlay, NULL },
{ "datalist", false, false, false, ActiveOverlay, NULL },
{ "dd", false, false, false, ActiveOverlay, dl_0 },
{ "del", false, false, false, ActiveOverlay, NULL },
{ "details", false, false, false, ActiveOverlay, NULL },
{ "dfn", false, false, false, ActiveOverlay, NULL },
{ "dialog", false, false, false, ActiveOverlay, dt_th_0 },
{ "dir", false, false, false, ActiveOverlay, NULL },
{ "div", false, true, false, ActiveOverlay, NULL },
{ "dl", false, false, false, ActiveOverlay, NULL },
{ "dt", false, true, false, ActiveOverlay, dl_0 },
{ "em", false, false, false, ActiveOverlay, NULL },
{ "embed", true, false, false, ActiveOverlay, NULL },
{ "fieldset", false, false, false, ActiveOverlay, NULL },
{ "figcaption", false, false, false, ActiveOverlay, figure_0 },
{ "figure", false, false, false, ActiveOverlay, NULL },
{ "font", false, true, false, ActiveOverlay, NULL },
{ "footer", false, false, false, ActiveOverlay, NULL },
{ "form", false, false, false, ActiveOverlay, NULL },
{ "frame", true, false, false, ActiveOverlay, frameset_0 },
{ "frameset", false, false, false, ActiveOverlay, NULL },
{ "h1", false, false, false, ActiveOverlay, NULL },
{ "h2", false, false, false, ActiveOverlay, NULL },
{ "h3", false, false, false, ActiveOverlay, NULL },
{ "h4", false, false, false, ActiveOverlay, NULL },
{ "h5", false, false, false, ActiveOverlay, NULL },
{ "h6", false, false, false, ActiveOverlay, NULL },
{ "head", false, false, false, ActiveOverlay, html_0 },
{ "header", false, false, false, ActiveOverlay, NULL },
{ "hgroup", false, false, false, ActiveOverlay, NULL },
{ "hr", true, false, false, ActiveOverlay, NULL },
{ "html", false, false, false, ActiveOverlay, NULL },
{ "i", false, false, false, ActiveOverlay, NULL },
{ "iframe", true, false, false, ActiveOverlay, NULL },
{ "img", true, false, false, ActiveOverlay, NULL },
{ "input", true, false, false, ActiveOverlay, NULL }, // somewhere under a form_0
{ "ins", false, false, false, ActiveOverlay, NULL },
{ "kbd", false, false, false, ActiveOverlay, NULL },
{ "keygen", true, false, false, ActiveOverlay, NULL },
{ "label", false, false, false, ActiveOverlay, NULL }, // somewhere under a form_0
{ "legend", false, false, false, ActiveOverlay, fieldset_0 },
{ "li", false, false, false, ActiveOverlay, ol_ul_menu_0 },
{ "link", true, false, false, ActiveOverlay, head_0 },
{ "map", false, false, false, ActiveOverlay, NULL },
{ "mark", false, false, false, ActiveOverlay, NULL },
{ "menu", false, false, false, ActiveOverlay, NULL },
{ "meta", true, false, false, ActiveOverlay, head_0 },
{ "meter", false, false, false, ActiveOverlay, NULL },
{ "nav", false, false, false, ActiveOverlay, NULL },
{ "noframes", false, false, false, ActiveOverlay, NULL },
{ "noscript", false, false, false, ActiveOverlay, NULL },
{ "object", false, false, false, ActiveOverlay, NULL },
{ "ol", false, true, false, ActiveOverlay, NULL },
{ "optgroup", false, false, false, ActiveOverlay, NULL },
{ "option", false, false, false, ActiveOverlay, optgroup_select_datalist_0 },
{ "output", false, false, false, ActiveOverlay, NULL },
{ "p", false, false, false, ActiveOverlay, NULL },
{ "param", true, false, false, ActiveOverlay, NULL },
{ "pre", false, false, false, ActiveOverlay, NULL },
{ "progress", false, false, false, ActiveOverlay, NULL },
{ "q", false, false, false, ActiveOverlay, NULL },
{ "rp", false, false, false, ActiveOverlay, ruby_0 },
{ "rt", false, false, false, ActiveOverlay, ruby_0 },
{ "ruby", false, false, false, ActiveOverlay, NULL },
{ "s", false, false, false, ActiveOverlay, NULL },
{ "samp", false, false, false, ActiveOverlay, NULL },
{ "script", false, false, true, ActiveOverlay, NULL },
{ "section", false, true, false, ActiveOverlay, NULL },
{ "select", false, false, false, ActiveOverlay, NULL },
{ "small", false, false, false, ActiveOverlay, NULL },
{ "source", false, false, false, ActiveOverlay, audio_video_0 },
{ "span", false, true, false, ActiveOverlay, NULL },
{ "strike", false, false, false, ActiveOverlay, NULL },
{ "strong", false, false, false, ActiveOverlay, NULL },
{ "style", false, false, false, ActiveOverlay, NULL },
{ "sub", false, false, false, ActiveOverlay, NULL },
{ "summary", false, false, false, ActiveOverlay, details_0 },
{ "sup", false, false, false, ActiveOverlay, NULL },
{ "table", false, false, false, ActiveOverlay, NULL },
{ "tbody", false, false, false, ActiveOverlay, table_0 },
{ "td", false, false, false, ActiveOverlay, tr_0 },
{ "textarea", false, false, false, ActiveOverlay, NULL },
{ "tfoot", false, false, false, ActiveOverlay, table_0 },
{ "th", false, false, false, ActiveOverlay, tr_0 },
{ "thead", false, false, false, ActiveOverlay, table_0 },
{ "time", false, false, false, ActiveOverlay, NULL },
{ "title", false, false, false, ActiveOverlay, head_0 },
{ "tr", false, false, false, ActiveOverlay, table_0 },
{ "track", true, false, false, ActiveOverlay, audio_video_0 },
{ "tt", false, false, false, ActiveOverlay, NULL },
{ "u", false, false, false, ActiveOverlay, NULL },
{ "ul", false, false, false, ActiveOverlay, NULL },
{ "var", false, false, false, ActiveOverlay, NULL },
{ "video", false, false, false, ActiveOverlay, NULL },
{ "wbr", true, false, false, ActiveOverlay, NULL },
};
static struct _Hints html_hints = {
"HTML",
html_hint_array,
sizeof(html_hint_array) / sizeof(*html_hint_array)
};
Hints
ox_hints_html() {
return &html_hints;
}
Hints
ox_hints_dup(Hints h) {
Hints nh = ALLOC(struct _Hints);
nh->hints = ALLOC_N(struct _Hint, h->size);
memcpy(nh->hints, h->hints, sizeof(struct _Hint) * h->size);
nh->size = h->size;
nh->name = h->name;
return nh;
}
void
ox_hints_destroy(Hints h) {
if (NULL != h && &html_hints != h) {
xfree(h->hints);
xfree(h);
}
}
Hint
ox_hint_find(Hints hints, const char *name) {
if (0 != hints) {
Hint lo = hints->hints;
Hint hi = hints->hints + hints->size - 1;
Hint mid;
int res;
if (0 == (res = strcasecmp(name, lo->name))) {
return lo;
} else if (0 > res) {
return 0;
}
if (0 == (res = strcasecmp(name, hi->name))) {
return hi;
} else if (0 < res) {
return 0;
}
while (1 < hi - lo) {
mid = lo + (hi - lo) / 2;
if (0 == (res = strcasecmp(name, mid->name))) {
return mid;
} else if (0 < res) {
lo = mid;
} else {
hi = mid;
}
}
}
return 0;
}
ox-2.8.2/ext/ox/buf.h 0000644 0000041 0000041 00000010304 13203413063 014345 0 ustar www-data www-data /* buf.h
* Copyright (c) 2014, Peter Ohler
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* - Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* - Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* - Neither the name of Peter Ohler nor the names of its contributors may be
* used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __OX_BUF_H__
#define __OX_BUF_H__
#include
#include
typedef struct _Buf {
char *head;
char *end;
char *tail;
int fd;
bool err;
char base[16384];
} *Buf;
inline static void
buf_init(Buf buf, int fd, long initial_size) {
if (sizeof(buf->base) < (size_t)initial_size) {
buf->head = ALLOC_N(char, initial_size);
buf->end = buf->head + initial_size - 1;
} else {
buf->head = buf->base;
buf->end = buf->base + sizeof(buf->base) - 1;
}
buf->tail = buf->head;
buf->fd = fd;
buf->err = false;
}
inline static void
buf_reset(Buf buf) {
buf->head = buf->base;
buf->tail = buf->head;
}
inline static void
buf_cleanup(Buf buf) {
if (buf->base != buf->head) {
free(buf->head);
}
}
inline static size_t
buf_len(Buf buf) {
return buf->tail - buf->head;
}
inline static void
buf_append_string(Buf buf, const char *s, size_t slen) {
if (buf->err) {
return;
}
if (buf->end <= buf->tail + slen) {
if (0 != buf->fd) {
size_t len = buf->tail - buf->head;
if (len != (size_t)write(buf->fd, buf->head, len)) {
buf->err = true;
}
buf->tail = buf->head;
} else {
size_t len = buf->end - buf->head;
size_t toff = buf->tail - buf->head;
size_t new_len = len + slen + len / 2;
if (buf->base == buf->head) {
buf->head = ALLOC_N(char, new_len);
memcpy(buf->head, buf->base, len);
} else {
REALLOC_N(buf->head, char, new_len);
}
buf->tail = buf->head + toff;
buf->end = buf->head + new_len - 2;
}
}
if (0 < slen) {
memcpy(buf->tail, s, slen);
}
buf->tail += slen;
}
inline static void
buf_append(Buf buf, char c) {
if (buf->err) {
return;
}
if (buf->end <= buf->tail) {
if (0 != buf->fd) {
size_t len = buf->tail - buf->head;
if (len != (size_t)write(buf->fd, buf->head, len)) {
buf->err = true;
}
buf->tail = buf->head;
} else {
size_t len = buf->end - buf->head;
size_t toff = buf->tail - buf->head;
size_t new_len = len + len / 2;
if (buf->base == buf->head) {
buf->head = ALLOC_N(char, new_len);
memcpy(buf->head, buf->base, len);
} else {
REALLOC_N(buf->head, char, new_len);
}
buf->tail = buf->head + toff;
buf->end = buf->head + new_len - 2;
}
}
*buf->tail++ = c;
//*buf->tail = '\0'; // for debugging
}
inline static void
buf_finish(Buf buf) {
if (buf->err) {
return;
}
if (0 != buf->fd) {
size_t len = buf->tail - buf->head;
if (0 < len && len != (size_t)write(buf->fd, buf->head, len)) {
buf->err = true;
}
fsync(buf->fd);
buf->tail = buf->head;
}
}
#endif /* __OX_BUF_H__ */
ox-2.8.2/ext/ox/extconf.rb 0000644 0000041 0000041 00000006171 13203413063 015422 0 ustar www-data www-data require 'mkmf'
extension_name = 'ox'
dir_config(extension_name)
parts = RUBY_DESCRIPTION.split(' ')
type = parts[0].downcase()
type = 'ree' if 'ruby' == type && RUBY_DESCRIPTION.include?('Ruby Enterprise Edition')
is_windows = RbConfig::CONFIG['host_os'] =~ /(mingw|mswin)/
platform = RUBY_PLATFORM
version = RUBY_VERSION.split('.')
puts ">>>>> Creating Makefile for #{type} version #{RUBY_VERSION} on #{platform} <<<<<"
dflags = {
'RUBY_TYPE' => type,
(type.upcase + '_RUBY') => nil,
'RUBY_VERSION' => RUBY_VERSION,
'RUBY_VERSION_MAJOR' => version[0],
'RUBY_VERSION_MINOR' => version[1],
'RUBY_VERSION_MICRO' => version[2],
'HAS_RB_TIME_TIMESPEC' => ('ruby' == type && ('1.9.3' == RUBY_VERSION)) ? 1 : 0,
#'HAS_RB_TIME_TIMESPEC' => ('ruby' == type && ('1.9.3' == RUBY_VERSION || '2' <= version[0])) ? 1 : 0,
'HAS_TM_GMTOFF' => ('ruby' == type && (('1' == version[0] && '9' == version[1]) || '2' <= version[0]) &&
!(platform.include?('cygwin') || platform.include?('solaris') || platform.include?('linux') || RUBY_PLATFORM =~ /(win|w)32$/)) ? 1 : 0,
'HAS_ENCODING_SUPPORT' => (('ruby' == type || 'rubinius' == type || 'macruby' == type) &&
(('1' == version[0] && '9' == version[1]) || '2' <= version[0])) ? 1 : 0,
'HAS_ONIG' => (('ruby' == type || 'jruby' == type || 'rubinius' == type) &&
(('1' == version[0] && '9' == version[1]) || '2' <= version[0])) ? 1 : 0,
'HAS_PRIVATE_ENCODING' => ('jruby' == type && '1' == version[0] && '9' == version[1]) ? 1 : 0,
'HAS_NANO_TIME' => ('ruby' == type && ('1' == version[0] && '9' == version[1]) || '2' <= version[0]) ? 1 : 0,
'HAS_RSTRUCT' => ('ruby' == type || 'ree' == type) ? 1 : 0,
'HAS_IVAR_HELPERS' => ('ruby' == type && !is_windows && (('1' == version[0] && '9' == version[1]) || '2' <= version[0])) ? 1 : 0,
'HAS_PROC_WITH_BLOCK' => ('ruby' == type && ('1' == version[0] && '9' == version[1]) || '2' <= version[0]) ? 1 : 0,
'HAS_GC_GUARD' => ('jruby' != type && 'rubinius' != type) ? 1 : 0,
'HAS_BIGDECIMAL' => ('jruby' != type) ? 1 : 0,
'HAS_TOP_LEVEL_ST_H' => ('ree' == type || ('ruby' == type && '1' == version[0] && '8' == version[1])) ? 1 : 0,
'NEEDS_UIO' => (RUBY_PLATFORM =~ /(win|w)32$/) ? 0 : 1,
'HAS_DATA_OBJECT_WRAP' => ('ruby' == type && '2' == version[0] && '3' <= version[1]) ? 1 : 0,
'UNIFY_FIXNUM_AND_BIGNUM' => ('ruby' == type && '2' == version[0] && '4' <= version[1]) ? 1 : 0,
}
if RUBY_PLATFORM =~ /(win|w)32$/ || RUBY_PLATFORM =~ /solaris2\.10/
dflags['NEEDS_STPCPY'] = nil
end
if ['i386-darwin10.0.0', 'x86_64-darwin10.8.0'].include? RUBY_PLATFORM
dflags['NEEDS_STPCPY'] = nil
dflags['HAS_IVAR_HELPERS'] = 0 if ('ruby' == type && '1.9.1' == RUBY_VERSION)
elsif 'x86_64-linux' == RUBY_PLATFORM && '1.9.3' == RUBY_VERSION && '2011-10-30' == RUBY_RELEASE_DATE
begin
dflags['NEEDS_STPCPY'] = nil if `more /etc/issue`.include?('CentOS release 5.4')
rescue Exception
end
end
dflags.each do |k,v|
if v.nil?
$CPPFLAGS += " -D#{k}"
else
$CPPFLAGS += " -D#{k}=#{v}"
end
end
$CPPFLAGS += ' -Wall'
#puts "*** $CPPFLAGS: #{$CPPFLAGS}"
create_makefile(extension_name)
%x{make clean}
ox-2.8.2/ext/ox/helper.h 0000644 0000041 0000041 00000004203 13203413063 015051 0 ustar www-data www-data /* helper.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_HELPER_H__
#define __OX_HELPER_H__
#include "type.h"
#define HELPER_STACK_INC 16
typedef struct _Helper {
ID var; /* Object var ID */
VALUE obj; /* object created or Qundef if not appropriate */
Type type; /* type of object in obj */
} *Helper;
typedef struct _HelperStack {
struct _Helper base[HELPER_STACK_INC];
Helper head; /* current stack */
Helper end; /* stack end */
Helper tail; /* pointer to one past last element name on stack */
} *HelperStack;
inline static void
helper_stack_init(HelperStack stack) {
stack->head = stack->base;
stack->end = stack->base + sizeof(stack->base) / sizeof(struct _Helper);
stack->tail = stack->head;
}
inline static int
helper_stack_empty(HelperStack stack) {
return (stack->head == stack->tail);
}
inline static int
helper_stack_depth(HelperStack stack) {
return (int)(stack->tail - stack->head);
}
inline static void
helper_stack_cleanup(HelperStack stack) {
if (stack->base != stack->head) {
xfree(stack->head);
stack->head = stack->base;
}
}
inline static Helper
helper_stack_push(HelperStack stack, ID var, VALUE obj, Type type) {
if (stack->end <= stack->tail) {
size_t len = stack->end - stack->head;
size_t toff = stack->tail - stack->head;
if (stack->base == stack->head) {
stack->head = ALLOC_N(struct _Helper, len + HELPER_STACK_INC);
memcpy(stack->head, stack->base, sizeof(struct _Helper) * len);
} else {
REALLOC_N(stack->head, struct _Helper, len + HELPER_STACK_INC);
}
stack->tail = stack->head + toff;
stack->end = stack->head + len + HELPER_STACK_INC;
}
stack->tail->var = var;
stack->tail->obj = obj;
stack->tail->type = type;
stack->tail++;
return stack->tail - 1;
}
inline static Helper
helper_stack_peek(HelperStack stack) {
if (stack->head < stack->tail) {
return stack->tail - 1;
}
return 0;
}
inline static Helper
helper_stack_pop(HelperStack stack) {
if (stack->head < stack->tail) {
stack->tail--;
return stack->tail;
}
return 0;
}
#endif /* __OX_HELPER_H__ */
ox-2.8.2/ext/ox/sax_as.c 0000644 0000041 0000041 00000014212 13203413063 015044 0 ustar www-data www-data /* sax_as.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#if NEEDS_UIO
#include
#endif
#include
#include
#include "ruby.h"
#include "ox.h"
#include "sax.h"
static VALUE
parse_double_time(const char *text) {
long v = 0;
long v2 = 0;
const char *dot = 0;
char c;
for (; '.' != *text; text++) {
c = *text;
if (c < '0' || '9' < c) {
return Qnil;
}
v = 10 * v + (long)(c - '0');
}
dot = text++;
for (; '\0' != *text && text - dot <= 6; text++) {
c = *text;
if (c < '0' || '9' < c) {
return Qnil;
}
v2 = 10 * v2 + (long)(c - '0');
}
for (; text - dot <= 9; text++) {
v2 *= 10;
}
#if HAS_NANO_TIME
return rb_time_nano_new(v, v2);
#else
return rb_time_new(v, v2 / 1000);
#endif
}
typedef struct _Tp {
int cnt;
char end;
char alt;
} *Tp;
static VALUE
parse_xsd_time(const char *text) {
long cargs[10];
long *cp = cargs;
long v;
int i;
char c = '\0';
struct _Tp tpa[10] = { { 4, '-', '-' },
{ 2, '-', '-' },
{ 2, 'T', ' ' },
{ 2, ':', ':' },
{ 2, ':', ':' },
{ 2, '.', '.' },
{ 9, '+', '-' },
{ 2, ':', ':' },
{ 2, '\0', '\0' },
{ 0, '\0', '\0' } };
Tp tp = tpa;
struct tm tm;
memset(cargs, 0, sizeof(cargs));
for (; 0 != tp->cnt; tp++) {
for (i = tp->cnt, v = 0; 0 < i ; text++, i--) {
c = *text;
if (c < '0' || '9' < c) {
if ('\0' == c || tp->end == c || tp->alt == c) {
break;
}
return Qnil;
}
v = 10 * v + (long)(c - '0');
}
if ('\0' == c) {
break;
}
c = *text++;
if (tp->end != c && tp->alt != c) {
return Qnil;
}
*cp++ = v;
}
tm.tm_year = (int)cargs[0] - 1900;
tm.tm_mon = (int)cargs[1] - 1;
tm.tm_mday = (int)cargs[2];
tm.tm_hour = (int)cargs[3];
tm.tm_min = (int)cargs[4];
tm.tm_sec = (int)cargs[5];
#if HAS_NANO_TIME
return rb_time_nano_new(mktime(&tm), cargs[6]);
#else
return rb_time_new(mktime(&tm), cargs[6] / 1000);
#endif
}
/* call-seq: as_s()
*
* *return* value as an String.
*/
static VALUE
sax_value_as_s(VALUE self) {
SaxDrive dr = DATA_PTR(self);
VALUE rs;
if ('\0' == *dr->buf.str) {
return Qnil;
}
if (dr->options.convert_special) {
ox_sax_collapse_special(dr, dr->buf.str, dr->buf.pos, dr->buf.line, dr->buf.col);
}
switch (dr->options.skip) {
case CrSkip:
buf_collapse_return(dr->buf.str);
break;
case SpcSkip:
buf_collapse_white(dr->buf.str);
break;
default:
break;
}
rs = rb_str_new2(dr->buf.str);
#if HAS_ENCODING_SUPPORT
if (0 != dr->encoding) {
rb_enc_associate(rs, dr->encoding);
}
#elif HAS_PRIVATE_ENCODING
if (Qnil != dr->encoding) {
rb_funcall(rs, ox_force_encoding_id, 1, dr->encoding);
}
#endif
return rs;
}
/* call-seq: as_sym()
*
* *return* value as an Symbol.
*/
static VALUE
sax_value_as_sym(VALUE self) {
SaxDrive dr = DATA_PTR(self);
if ('\0' == *dr->buf.str) {
return Qnil;
}
return str2sym(dr, dr->buf.str, 0);
}
/* call-seq: as_f()
*
* *return* value as an Float.
*/
static VALUE
sax_value_as_f(VALUE self) {
SaxDrive dr = DATA_PTR(self);
if ('\0' == *dr->buf.str) {
return Qnil;
}
return rb_float_new(strtod(dr->buf.str, 0));
}
/* call-seq: as_i()
*
* *return* value as an Fixnum.
*/
static VALUE
sax_value_as_i(VALUE self) {
SaxDrive dr = DATA_PTR(self);
const char *s = dr->buf.str;
long n = 0;
int neg = 0;
if ('\0' == *s) {
return Qnil;
}
if ('-' == *s) {
neg = 1;
s++;
} else if ('+' == *s) {
s++;
}
for (; '\0' != *s; s++) {
if ('0' <= *s && *s <= '9') {
n = n * 10 + (*s - '0');
} else {
rb_raise(ox_arg_error_class, "Not a valid Fixnum.\n");
}
}
if (neg) {
n = -n;
}
return LONG2NUM(n);
}
/* call-seq: as_time()
*
* *return* value as an Time.
*/
static VALUE
sax_value_as_time(VALUE self) {
SaxDrive dr = DATA_PTR(self);
const char *str = dr->buf.str;
VALUE t;
if ('\0' == *str) {
return Qnil;
}
if (Qnil == (t = parse_double_time(str)) &&
Qnil == (t = parse_xsd_time(str))) {
VALUE args[1];
/*printf("**** time parse\n"); */
*args = rb_str_new2(str);
t = rb_funcall2(ox_time_class, ox_parse_id, 1, args);
}
return t;
}
/* call-seq: as_bool()
*
* *return* value as an boolean.
*/
static VALUE
sax_value_as_bool(VALUE self) {
return (0 == strcasecmp("true", ((SaxDrive)DATA_PTR(self))->buf.str)) ? Qtrue : Qfalse;
}
/* call-seq: empty()
*
* *return* true if the value is empty.
*/
static VALUE
sax_value_empty(VALUE self) {
return ('\0' == *((SaxDrive)DATA_PTR(self))->buf.str) ? Qtrue : Qfalse;
}
/* Document-class: Ox::Sax::Value
*
* Values in the SAX callbacks. They can be converted to various different
* types. with the _as_x()_ methods.
*/
void
ox_sax_define() {
#if 0
ox = rb_define_module("Ox");
sax_module = rb_define_class_under(ox, "Sax", rb_cObject);
#endif
VALUE sax_module = rb_const_get_at(Ox, rb_intern("Sax"));
ox_sax_value_class = rb_define_class_under(sax_module, "Value", rb_cObject);
rb_define_method(ox_sax_value_class, "as_s", sax_value_as_s, 0);
rb_define_method(ox_sax_value_class, "as_sym", sax_value_as_sym, 0);
rb_define_method(ox_sax_value_class, "as_i", sax_value_as_i, 0);
rb_define_method(ox_sax_value_class, "as_f", sax_value_as_f, 0);
rb_define_method(ox_sax_value_class, "as_time", sax_value_as_time, 0);
rb_define_method(ox_sax_value_class, "as_bool", sax_value_as_bool, 0);
rb_define_method(ox_sax_value_class, "empty?", sax_value_empty, 0);
}
ox-2.8.2/ext/ox/sax_hint.h 0000644 0000041 0000041 00000001577 13203413063 015422 0 ustar www-data www-data /* hint.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_HINT_H__
#define __OX_HINT_H__
#include
typedef enum {
ActiveOverlay = 0,
InactiveOverlay = 'i',
BlockOverlay = 'b',
OffOverlay = 'o',
AbortOverlay = 'a',
NestOverlay = 'n', // nest flag is ignored
} Overlay;
typedef struct _Hint {
const char *name;
char empty; // must be closed or close auto it, not error
char nest; // nesting allowed
char jump; // jump to end
char overlay;// Overlay
const char **parents;
} *Hint;
typedef struct _Hints {
const char *name;
Hint hints; // array of hints
int size;
} *Hints;
extern Hints ox_hints_html(void);
extern Hint ox_hint_find(Hints hints, const char *name);
extern Hints ox_hints_dup(Hints h);
extern void ox_hints_destroy(Hints h);
#endif /* __OX_HINT_H__ */
ox-2.8.2/ext/ox/base64.h 0000644 0000041 0000041 00000000634 13203413063 014662 0 ustar www-data www-data /* base64.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __BASE64_H__
#define __BASE64_H__
typedef unsigned char uchar;
#define b64_size(len) ((len + 2) / 3 * 4)
extern unsigned long b64_orig_size(const char *text);
extern void to_base64(const uchar *src, int len, char *b64);
extern void from_base64(const char *b64, uchar *str);
#endif /* __BASE64_H__ */
ox-2.8.2/ext/ox/ox.c 0000644 0000041 0000041 00000155400 13203413063 014221 0 ustar www-data www-data /* ox.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#include
#include "ruby.h"
#include "ox.h"
#include "sax.h"
/* maximum to allocate on the stack, arbitrary limit */
#define SMALL_XML 4096
#define WITH_CACHE_TESTS 0
typedef struct _YesNoOpt {
VALUE sym;
char *attr;
} *YesNoOpt;
void Init_ox();
VALUE Ox = Qnil;
ID ox_abort_id;
ID ox_at_column_id;
ID ox_at_content_id;
ID ox_at_id;
ID ox_at_line_id;
ID ox_at_pos_id;
ID ox_at_value_id;
ID ox_attr_id;
ID ox_attr_value_id;
ID ox_attributes_id;
ID ox_attrs_done_id;
ID ox_beg_id;
ID ox_cdata_id;
ID ox_comment_id;
ID ox_den_id;
ID ox_doctype_id;
ID ox_end_element_id;
ID ox_end_id;
ID ox_end_instruct_id;
ID ox_error_id;
ID ox_excl_id;
ID ox_external_encoding_id;
ID ox_fileno_id;
ID ox_force_encoding_id;
ID ox_inspect_id;
ID ox_instruct_id;
ID ox_jd_id;
ID ox_keys_id;
ID ox_local_id;
ID ox_mesg_id;
ID ox_message_id;
ID ox_new_id;
ID ox_nodes_id;
ID ox_num_id;
ID ox_parse_id;
ID ox_pos_id;
ID ox_read_id;
ID ox_readpartial_id;
ID ox_start_element_id;
ID ox_string_id;
ID ox_text_id;
ID ox_to_c_id;
ID ox_to_s_id;
ID ox_to_sym_id;
ID ox_tv_nsec_id;
ID ox_tv_sec_id;
ID ox_tv_usec_id;
ID ox_value_id;
VALUE ox_encoding_sym;
VALUE ox_version_sym;
VALUE ox_standalone_sym;
VALUE ox_indent_sym;
VALUE ox_size_sym;
VALUE ox_empty_string;
VALUE ox_zero_fixnum;
VALUE ox_sym_bank; // Array
VALUE ox_arg_error_class;
VALUE ox_bag_clas;
VALUE ox_bigdecimal_class;
VALUE ox_cdata_clas;
VALUE ox_comment_clas;
VALUE ox_raw_clas;
VALUE ox_date_class;
VALUE ox_doctype_clas;
VALUE ox_document_clas;
VALUE ox_element_clas;
VALUE ox_instruct_clas;
VALUE ox_parse_error_class;
VALUE ox_stringio_class;
VALUE ox_struct_class;
VALUE ox_time_class;
Cache ox_symbol_cache = 0;
Cache ox_class_cache = 0;
Cache ox_attr_cache = 0;
static VALUE abort_sym;
static VALUE active_sym;
static VALUE auto_define_sym;
static VALUE auto_sym;
static VALUE block_sym;
static VALUE circular_sym;
static VALUE convert_special_sym;
static VALUE effort_sym;
static VALUE generic_sym;
static VALUE hash_no_attrs_sym;
static VALUE hash_sym;
static VALUE inactive_sym;
static VALUE invalid_replace_sym;
static VALUE limited_sym;
static VALUE margin_sym;
static VALUE mode_sym;
static VALUE nest_ok_sym;
static VALUE object_sym;
static VALUE off_sym;
static VALUE opt_format_sym;
static VALUE optimized_sym;
static VALUE overlay_sym;
static VALUE skip_none_sym;
static VALUE skip_off_sym;
static VALUE skip_return_sym;
static VALUE skip_sym;
static VALUE skip_white_sym;
static VALUE smart_sym;
static VALUE strict_sym;
static VALUE strip_namespace_sym;
static VALUE symbolize_keys_sym;
static VALUE symbolize_sym;
static VALUE tolerant_sym;
static VALUE trace_sym;
static VALUE with_dtd_sym;
static VALUE with_instruct_sym;
static VALUE with_xml_sym;
static VALUE xsd_date_sym;
static ID encoding_id;
static ID has_key_id;
#if HAS_ENCODING_SUPPORT
rb_encoding *ox_utf8_encoding = 0;
#elif HAS_PRIVATE_ENCODING
VALUE ox_utf8_encoding = Qnil;
#else
void *ox_utf8_encoding = 0;
#endif
struct _Options ox_default_options = {
{ '\0' }, /* encoding */
{ '\0' }, /* margin */
2, /* indent */
0, /* trace */
0, /* margin_len */
No, /* with_dtd */
No, /* with_xml */
No, /* with_instruct */
No, /* circular */
No, /* xsd_date */
NoMode, /* mode */
StrictEffort, /* effort */
Yes, /* sym_keys */
SpcSkip, /* skip */
No, /* smart */
1, /* convert_special */
No, /* allow_invalid */
{ '\0' }, /* inv_repl */
{ '\0' }, /* strip_ns */
NULL, /* html_hints */
#if HAS_PRIVATE_ENCODING
Qnil /* rb_enc */
#else
0 /* rb_enc */
#endif
};
extern ParseCallbacks ox_obj_callbacks;
extern ParseCallbacks ox_gen_callbacks;
extern ParseCallbacks ox_limited_callbacks;
extern ParseCallbacks ox_nomode_callbacks;
extern ParseCallbacks ox_hash_callbacks;
extern ParseCallbacks ox_hash_no_attrs_callbacks;
static void parse_dump_options(VALUE ropts, Options copts);
static char*
defuse_bom(char *xml, Options options) {
switch ((uint8_t)*xml) {
case 0xEF: /* UTF-8 */
if (0xBB == (uint8_t)xml[1] && 0xBF == (uint8_t)xml[2]) {
options->rb_enc = ox_utf8_encoding;
xml += 3;
} else {
rb_raise(ox_parse_error_class, "Invalid BOM in XML string.\n");
}
break;
#if 0
case 0xFE: /* UTF-16BE */
if (0xFF == (uint8_t)xml[1]) {
options->rb_enc = ox_utf16be_encoding;
xml += 2;
} else {
rb_raise(ox_parse_error_class, "Invalid BOM in XML string.\n");
}
break;
case 0xFF: /* UTF-16LE or UTF-32LE */
if (0xFE == (uint8_t)xml[1]) {
if (0x00 == (uint8_t)xml[2] && 0x00 == (uint8_t)xml[3]) {
options->rb_enc = ox_utf32le_encoding;
xml += 4;
} else {
options->rb_enc = ox_utf16le_encoding;
xml += 2;
}
} else {
rb_raise(ox_parse_error_class, "Invalid BOM in XML string.\n");
}
break;
case 0x00: /* UTF-32BE */
if (0x00 == (uint8_t)xml[1] && 0xFE == (uint8_t)xml[2] && 0xFF == (uint8_t)xml[3]) {
options->rb_enc = ox_utf32be_encoding;
xml += 4;
} else {
rb_raise(ox_parse_error_class, "Invalid BOM in XML string.\n");
}
break;
#endif
default:
/* Let it fail if there is a BOM that is not UTF-8. Other BOM options are not ASCII compatible. */
break;
}
return xml;
}
static VALUE
hints_to_overlay(Hints hints) {
volatile VALUE overlay = rb_hash_new();
Hint h;
int i;
VALUE ov;
for (i = hints->size, h = hints->hints; 0 < i; i--, h++) {
switch (h->overlay) {
case InactiveOverlay: ov = inactive_sym; break;
case BlockOverlay: ov = block_sym; break;
case OffOverlay: ov = off_sym; break;
case AbortOverlay: ov = abort_sym; break;
case NestOverlay: ov = nest_ok_sym; break;
case ActiveOverlay:
default: ov = active_sym; break;
}
rb_hash_aset(overlay, rb_str_new2(h->name), ov);
}
return overlay;
}
/* call-seq: default_options() => Hash
*
* Returns the default load and dump options as a Hash. The options are
* - _:margin_ [String] left margin to inset when dumping
* - _:indent_ [Fixnum] number of spaces to indent each element in an XML document
* - _:trace_ [Fixnum] trace level where 0 is silent
* - _:encoding_ [String] character encoding for the XML file
* - _:with_dtd_ [true|false|nil] include DTD in the dump
* - _:with_instruct_ [true|false|nil] include instructions in the dump
* - _:with_xml_ [true|false|nil] include XML prolog in the dump
* - _:circular_ [true|false|nil] support circular references while dumping
* - _:xsd_date_ [true|false|nil] use XSD date format instead of decimal format
* - _:mode_ [:object|:generic|:limited|:hash|:hash_no_attrs|nil] load method to use for XML
* - _:effort_ [:strict|:tolerant|:auto_define] set the tolerance level for loading
* - _:symbolize_keys_ [true|false|nil] symbolize element attribute keys or leave as Strings
* - _:skip_ [:skip_none|:skip_return|:skip_white|:skip_off] determines how to handle white space in text
* - _:smart_ [true|false|nil] flag indicating the SAX parser uses hints if available (use with html)
* - _:convert_special_ [true|false|nil] flag indicating special characters like < are converted with the SAX parser
* - _:invalid_replace_ [nil|String] replacement string for invalid XML characters on dump. nil indicates include anyway as hex. A string, limited to 10 characters will replace the invalid character with the replace.
* - _:strip_namespace_ [String|true|false] false or "" results in no namespace stripping. A string of "*" or true will strip all namespaces. Any other non-empty string indicates that matching namespaces will be stripped.
* - _:overlay_ [Hash] a Hash of keys that match html element names and values that are one of
* - _:active_ - make the normal callback for the element
* - _:nest_ok_ - active but the nesting check is ignored
* - _:inactive_ - do not make the element start, end, or attribute callbacks for this element only
* - _:block_ - block this and all children callbacks
* - _:off_ - block this element and it's children unless the child element is active
* - _:abort_ - abort the html processing and return
*
* *return* [Hash] all current option settings.
*
* Note that an indent of less than zero will result in a tight one line output
* unless the text in the XML fields contain new line characters.
*/
static VALUE
get_def_opts(VALUE self) {
VALUE opts = rb_hash_new();
int elen = (int)strlen(ox_default_options.encoding);
rb_hash_aset(opts, ox_encoding_sym, (0 == elen) ? Qnil : rb_str_new(ox_default_options.encoding, elen));
rb_hash_aset(opts, margin_sym, rb_str_new(ox_default_options.margin, ox_default_options.margin_len));
rb_hash_aset(opts, ox_indent_sym, INT2FIX(ox_default_options.indent));
rb_hash_aset(opts, trace_sym, INT2FIX(ox_default_options.trace));
rb_hash_aset(opts, with_dtd_sym, (Yes == ox_default_options.with_dtd) ? Qtrue : ((No == ox_default_options.with_dtd) ? Qfalse : Qnil));
rb_hash_aset(opts, with_xml_sym, (Yes == ox_default_options.with_xml) ? Qtrue : ((No == ox_default_options.with_xml) ? Qfalse : Qnil));
rb_hash_aset(opts, with_instruct_sym, (Yes == ox_default_options.with_instruct) ? Qtrue : ((No == ox_default_options.with_instruct) ? Qfalse : Qnil));
rb_hash_aset(opts, circular_sym, (Yes == ox_default_options.circular) ? Qtrue : ((No == ox_default_options.circular) ? Qfalse : Qnil));
rb_hash_aset(opts, xsd_date_sym, (Yes == ox_default_options.xsd_date) ? Qtrue : ((No == ox_default_options.xsd_date) ? Qfalse : Qnil));
rb_hash_aset(opts, symbolize_keys_sym, (Yes == ox_default_options.sym_keys) ? Qtrue : ((No == ox_default_options.sym_keys) ? Qfalse : Qnil));
rb_hash_aset(opts, smart_sym, (Yes == ox_default_options.smart) ? Qtrue : ((No == ox_default_options.smart) ? Qfalse : Qnil));
rb_hash_aset(opts, convert_special_sym, (ox_default_options.convert_special) ? Qtrue : Qfalse);
switch (ox_default_options.mode) {
case ObjMode: rb_hash_aset(opts, mode_sym, object_sym); break;
case GenMode: rb_hash_aset(opts, mode_sym, generic_sym); break;
case LimMode: rb_hash_aset(opts, mode_sym, limited_sym); break;
case HashMode: rb_hash_aset(opts, mode_sym, hash_sym); break;
case HashNoAttrMode: rb_hash_aset(opts, mode_sym, hash_no_attrs_sym); break;
case NoMode:
default: rb_hash_aset(opts, mode_sym, Qnil); break;
}
switch (ox_default_options.effort) {
case StrictEffort: rb_hash_aset(opts, effort_sym, strict_sym); break;
case TolerantEffort: rb_hash_aset(opts, effort_sym, tolerant_sym); break;
case AutoEffort: rb_hash_aset(opts, effort_sym, auto_define_sym); break;
case NoEffort:
default: rb_hash_aset(opts, effort_sym, Qnil); break;
}
switch (ox_default_options.skip) {
case OffSkip: rb_hash_aset(opts, skip_sym, skip_off_sym); break;
case NoSkip: rb_hash_aset(opts, skip_sym, skip_none_sym); break;
case CrSkip: rb_hash_aset(opts, skip_sym, skip_return_sym); break;
case SpcSkip: rb_hash_aset(opts, skip_sym, skip_white_sym); break;
default: rb_hash_aset(opts, skip_sym, Qnil); break;
}
if (Yes == ox_default_options.allow_invalid) {
rb_hash_aset(opts, invalid_replace_sym, Qnil);
} else {
rb_hash_aset(opts, invalid_replace_sym, rb_str_new(ox_default_options.inv_repl + 1, (int)*ox_default_options.inv_repl));
}
if ('\0' == *ox_default_options.strip_ns) {
rb_hash_aset(opts, strip_namespace_sym, Qfalse);
} else if ('*' == *ox_default_options.strip_ns && '\0' == ox_default_options.strip_ns[1]) {
rb_hash_aset(opts, strip_namespace_sym, Qtrue);
} else {
rb_hash_aset(opts, strip_namespace_sym, rb_str_new(ox_default_options.strip_ns, strlen(ox_default_options.strip_ns)));
}
if (NULL == ox_default_options.html_hints) {
//rb_hash_aset(opts, overlay_sym, hints_to_overlay(ox_hints_html()));
rb_hash_aset(opts, overlay_sym, Qnil);
} else {
rb_hash_aset(opts, overlay_sym, hints_to_overlay(ox_default_options.html_hints));
}
return opts;
}
static int
set_overlay(VALUE key, VALUE value, VALUE ctx) {
Hints hints = (Hints)ctx;
Hint hint;
if (NULL != (hint = ox_hint_find(hints, StringValuePtr(key)))) {
if (active_sym == value) {
hint->overlay = ActiveOverlay;
} else if (inactive_sym == value) {
hint->overlay = InactiveOverlay;
} else if (block_sym == value) {
hint->overlay = BlockOverlay;
} else if (nest_ok_sym == value) {
hint->overlay = NestOverlay;
} else if (off_sym == value) {
hint->overlay = OffOverlay;
} else if (abort_sym == value) {
hint->overlay = AbortOverlay;
}
}
return ST_CONTINUE;
}
/* call-seq: sax_html_overlay() => Hash
*
* Returns an overlay hash that can be modified and used as an overlay in the
* default options or in the sax_html() function call. Values for the keys are:
* - _:active_ - make the normal callback for the element
* - _:nest_ok_ - active but ignore nest check
* - _:inactive_ - do not make the element start, end, or attribute callbacks for this element only
* - _:block_ - block this and all children callbacks
* - _:off_ - block this element and it's children unless the child element is active
* - _:abort_ - abort the html processing and return
*
* *return* [Hash] default SAX HTML settings
*/
static VALUE
sax_html_overlay(VALUE self) {
return hints_to_overlay(ox_hints_html());
}
/* call-seq: default_options=(opts)
*
* Sets the default options for load and dump.
* - +opts+ [Hash] opts options to change
* - _:margin_ [String] left margin to inset when dumping
* - _:indent_ [Fixnum] number of spaces to indent each element in an XML document
* - _:trace_ [Fixnum] trace level where 0 is silent
* - _:encoding_ [String] character encoding for the XML file
* - _:with_dtd_ [true|false|nil] include DTD in the dump
* - _:with_instruct_ [true|false|nil] include instructions in the dump
* - _:with_xml_ [true|false|nil] include XML prolog in the dump
* - _:circular_ [true|false|nil] support circular references while dumping
* - _:xsd_date_ [true|false|nil] use XSD date format instead of decimal format
* - _:mode_ [:object|:generic|:limited|:hash|:hash_no_attrs|nil] load method to use for XML
* - _:effort_ [:strict|:tolerant|:auto_define] set the tolerance level for loading
* - _:symbolize_keys_ [true|false|nil] symbolize element attribute keys or leave as Strings
* - _:skip_ [:skip_none|:skip_return|:skip_white|:skip_off] determines how to handle white space in text
* - _:smart_ [true|false|nil] flag indicating the SAX parser uses hints if available (use with html)
* - _:invalid_replace_ [nil|String] replacement string for invalid XML characters on dump. nil indicates include anyway as hex. A string, limited to 10 characters will replace the invalid character with the replace.
* - _:strip_namespace_ [nil|String|true|false] "" or false result in no namespace stripping. A string of "*" or true will strip all namespaces. Any other non-empty string indicates that matching namespaces will be stripped.
* - _:overlay_ [Hash] a Hash of keys that match html element names and values that are one of
* - _:active_ - make the normal callback for the element
* - _:nest_ok_ - active but ignore nest check
* - _:inactive_ - do not make the element start, end, or attribute callbacks for this element only
* - _:block_ - block this and all children callbacks
* - _:off_ - block this element and it's children unless the child element is active
* - _:abort_ - abort the html processing and return
*
* *return* [nil]
*/
static VALUE
set_def_opts(VALUE self, VALUE opts) {
struct _YesNoOpt ynos[] = {
{ with_xml_sym, &ox_default_options.with_xml },
{ with_dtd_sym, &ox_default_options.with_dtd },
{ with_instruct_sym, &ox_default_options.with_instruct },
{ xsd_date_sym, &ox_default_options.xsd_date },
{ circular_sym, &ox_default_options.circular },
{ symbolize_keys_sym, &ox_default_options.sym_keys },
{ smart_sym, &ox_default_options.smart },
{ Qnil, 0 }
};
YesNoOpt o;
VALUE v;
Check_Type(opts, T_HASH);
v = rb_hash_aref(opts, ox_encoding_sym);
if (Qnil == v) {
*ox_default_options.encoding = '\0';
} else {
Check_Type(v, T_STRING);
strncpy(ox_default_options.encoding, StringValuePtr(v), sizeof(ox_default_options.encoding) - 1);
#if HAS_ENCODING_SUPPORT
ox_default_options.rb_enc = rb_enc_find(ox_default_options.encoding);
#elif HAS_PRIVATE_ENCODING
ox_default_options.rb_enc = rb_str_new2(ox_default_options.encoding);
rb_gc_register_address(&ox_default_options.rb_enc);
#endif
}
v = rb_hash_aref(opts, ox_indent_sym);
if (Qnil != v) {
Check_Type(v, T_FIXNUM);
ox_default_options.indent = FIX2INT(v);
}
v = rb_hash_aref(opts, trace_sym);
if (Qnil != v) {
Check_Type(v, T_FIXNUM);
ox_default_options.trace = FIX2INT(v);
}
v = rb_hash_aref(opts, mode_sym);
if (Qnil == v) {
ox_default_options.mode = NoMode;
} else if (object_sym == v) {
ox_default_options.mode = ObjMode;
} else if (generic_sym == v) {
ox_default_options.mode = GenMode;
} else if (limited_sym == v) {
ox_default_options.mode = LimMode;
} else if (hash_sym == v) {
ox_default_options.mode = HashMode;
} else if (hash_no_attrs_sym == v) {
ox_default_options.mode = HashNoAttrMode;
} else {
rb_raise(ox_parse_error_class, ":mode must be :object, :generic, :limited, :hash, :hash_no_attrs, or nil.\n");
}
v = rb_hash_aref(opts, effort_sym);
if (Qnil == v) {
ox_default_options.effort = NoEffort;
} else if (strict_sym == v) {
ox_default_options.effort = StrictEffort;
} else if (tolerant_sym == v) {
ox_default_options.effort = TolerantEffort;
} else if (auto_define_sym == v) {
ox_default_options.effort = AutoEffort;
} else {
rb_raise(ox_parse_error_class, ":effort must be :strict, :tolerant, :auto_define, or nil.\n");
}
v = rb_hash_aref(opts, skip_sym);
if (Qnil == v) {
ox_default_options.skip = NoSkip;
} else if (skip_off_sym == v) {
ox_default_options.skip = OffSkip;
} else if (skip_none_sym == v) {
ox_default_options.skip = NoSkip;
} else if (skip_return_sym == v) {
ox_default_options.skip = CrSkip;
} else if (skip_white_sym == v) {
ox_default_options.skip = SpcSkip;
} else {
rb_raise(ox_parse_error_class, ":skip must be :skip_none, :skip_return, :skip_white, :skip_off, or nil.\n");
}
v = rb_hash_lookup(opts, convert_special_sym);
if (Qnil == v) {
// no change
} else if (Qtrue == v) {
ox_default_options.convert_special = 1;
} else if (Qfalse == v) {
ox_default_options.convert_special = 0;
} else {
rb_raise(ox_parse_error_class, ":convert_special must be true or false.\n");
}
v = rb_hash_aref(opts, invalid_replace_sym);
if (Qnil == v) {
ox_default_options.allow_invalid = Yes;
} else {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(ox_default_options.inv_repl) - 2 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":invalid_replace can be no longer than %d characters.",
(int)sizeof(ox_default_options.inv_repl) - 2);
}
strncpy(ox_default_options.inv_repl + 1, StringValuePtr(v), sizeof(ox_default_options.inv_repl) - 1);
ox_default_options.inv_repl[sizeof(ox_default_options.inv_repl) - 1] = '\0';
*ox_default_options.inv_repl = (char)slen;
ox_default_options.allow_invalid = No;
}
v = rb_hash_aref(opts, strip_namespace_sym);
if (Qfalse == v) {
*ox_default_options.strip_ns = '\0';
} else if (Qtrue == v) {
*ox_default_options.strip_ns = '*';
ox_default_options.strip_ns[1] = '\0';
} else if (Qnil != v) {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(ox_default_options.strip_ns) - 1 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":strip_namespace can be no longer than %d characters.",
(int)sizeof(ox_default_options.strip_ns) - 1);
}
strncpy(ox_default_options.strip_ns, StringValuePtr(v), sizeof(ox_default_options.strip_ns) - 1);
ox_default_options.strip_ns[sizeof(ox_default_options.strip_ns) - 1] = '\0';
}
v = rb_hash_aref(opts, margin_sym);
if (Qnil != v) {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(ox_default_options.margin) - 1 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":margin can be no longer than %d characters.",
(int)sizeof(ox_default_options.margin) - 1);
}
strncpy(ox_default_options.margin, StringValuePtr(v), sizeof(ox_default_options.margin) - 1);
ox_default_options.margin[sizeof(ox_default_options.margin) - 1] = '\0';
ox_default_options.margin_len = strlen(ox_default_options.margin);
}
for (o = ynos; 0 != o->attr; o++) {
v = rb_hash_lookup(opts, o->sym);
if (Qnil == v) {
*o->attr = NotSet;
} else if (Qtrue == v) {
*o->attr = Yes;
} else if (Qfalse == v) {
*o->attr = No;
} else {
rb_raise(ox_parse_error_class, "%s must be true or false.\n", rb_id2name(SYM2ID(o->sym)));
}
}
v = rb_hash_aref(opts, overlay_sym);
if (Qnil == v) {
ox_hints_destroy(ox_default_options.html_hints);
ox_default_options.html_hints = NULL;
} else {
int cnt;
Check_Type(v, T_HASH);
cnt = (int)RHASH_SIZE(v);
if (0 == cnt) {
ox_hints_destroy(ox_default_options.html_hints);
ox_default_options.html_hints = NULL;
} else {
ox_hints_destroy(ox_default_options.html_hints);
ox_default_options.html_hints = ox_hints_dup(ox_hints_html());
rb_hash_foreach(v, set_overlay, (VALUE)ox_default_options.html_hints);
}
}
return Qnil;
}
/* call-seq: parse_obj(xml) => Object
*
* Parses an XML document String that is in the object format and returns an
* Object of the type represented by the XML. This function expects an
* optimized XML formated String. For other formats use the more generic
* Ox.load() method. Raises an exception if the XML is malformed or the
* classes specified in the file are not valid.
* - +xml+ [String] XML String in optimized Object format.
* *return* [Object] deserialized Object.
*/
static VALUE
to_obj(VALUE self, VALUE ruby_xml) {
char *xml, *x;
size_t len;
VALUE obj;
struct _Options options = ox_default_options;
struct _Err err;
err_init(&err);
Check_Type(ruby_xml, T_STRING);
/* the xml string gets modified so make a copy of it */
len = RSTRING_LEN(ruby_xml) + 1;
x = defuse_bom(StringValuePtr(ruby_xml), &options);
if (SMALL_XML < len) {
xml = ALLOC_N(char, len);
} else {
xml = ALLOCA_N(char, len);
}
memcpy(xml, x, len);
#if HAS_GC_GUARD
rb_gc_disable();
#endif
obj = ox_parse(xml, ox_obj_callbacks, 0, &options, &err);
if (SMALL_XML < len) {
xfree(xml);
}
#if HAS_GC_GUARD
RB_GC_GUARD(obj);
rb_gc_enable();
#endif
if (err_has(&err)) {
ox_err_raise(&err);
}
return obj;
}
/* call-seq: parse(xml) => Ox::Document or Ox::Element
*
* Parses and XML document String into an Ox::Document or Ox::Element.
* - +xml+ [String] xml XML String
* *return* [Ox::Document or Ox::Element] parsed XML document.
*
* _raise_ [Exception] if the XML is malformed.
*/
static VALUE
to_gen(VALUE self, VALUE ruby_xml) {
char *xml, *x;
size_t len;
VALUE obj;
struct _Options options = ox_default_options;
struct _Err err;
err_init(&err);
Check_Type(ruby_xml, T_STRING);
/* the xml string gets modified so make a copy of it */
len = RSTRING_LEN(ruby_xml) + 1;
x = defuse_bom(StringValuePtr(ruby_xml), &options);
if (SMALL_XML < len) {
xml = ALLOC_N(char, len);
} else {
xml = ALLOCA_N(char, len);
}
memcpy(xml, x, len);
obj = ox_parse(xml, ox_gen_callbacks, 0, &options, &err);
if (SMALL_XML < len) {
xfree(xml);
}
if (err_has(&err)) {
ox_err_raise(&err);
}
return obj;
}
static VALUE
load(char *xml, int argc, VALUE *argv, VALUE self, VALUE encoding, Err err) {
VALUE obj;
struct _Options options = ox_default_options;
if (1 == argc && rb_cHash == rb_obj_class(*argv)) {
VALUE h = *argv;
VALUE v;
if (Qnil != (v = rb_hash_lookup(h, mode_sym))) {
if (object_sym == v) {
options.mode = ObjMode;
} else if (optimized_sym == v) {
options.mode = ObjMode;
} else if (generic_sym == v) {
options.mode = GenMode;
} else if (limited_sym == v) {
options.mode = LimMode;
} else if (hash_sym == v) {
options.mode = HashMode;
} else if (hash_no_attrs_sym == v) {
options.mode = HashNoAttrMode;
} else {
rb_raise(ox_parse_error_class, ":mode must be :generic, :object, :limited, :hash, :hash_no_attrs.\n");
}
}
if (Qnil != (v = rb_hash_lookup(h, effort_sym))) {
if (auto_define_sym == v) {
options.effort = AutoEffort;
} else if (tolerant_sym == v) {
options.effort = TolerantEffort;
} else if (strict_sym == v) {
options.effort = StrictEffort;
} else {
rb_raise(ox_parse_error_class, ":effort must be :strict, :tolerant, or :auto_define.\n");
}
}
if (Qnil != (v = rb_hash_lookup(h, skip_sym))) {
if (skip_none_sym == v) {
options.skip = NoSkip;
} else if (skip_off_sym == v) {
options.skip = OffSkip;
} else if (skip_return_sym == v) {
options.skip = CrSkip;
} else if (skip_white_sym == v) {
options.skip = SpcSkip;
} else {
rb_raise(ox_parse_error_class, ":skip must be :skip_none, :skip_return, :skip_white, or :skip_off.\n");
}
}
if (Qnil != (v = rb_hash_lookup(h, trace_sym))) {
Check_Type(v, T_FIXNUM);
options.trace = FIX2INT(v);
}
if (Qnil != (v = rb_hash_lookup(h, symbolize_keys_sym))) {
options.sym_keys = (Qfalse == v) ? No : Yes;
}
if (Qnil != (v = rb_hash_lookup(h, convert_special_sym))) {
options.convert_special = (Qfalse != v);
}
v = rb_hash_lookup(h, invalid_replace_sym);
if (Qnil == v) {
if (Qtrue == rb_funcall(h, has_key_id, 1, invalid_replace_sym)) {
options.allow_invalid = Yes;
}
} else {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(options.inv_repl) - 2 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":invalid_replace can be no longer than %d characters.",
(int)sizeof(options.inv_repl) - 2);
}
strncpy(options.inv_repl + 1, StringValuePtr(v), sizeof(options.inv_repl) - 1);
options.inv_repl[sizeof(options.inv_repl) - 1] = '\0';
*options.inv_repl = (char)slen;
options.allow_invalid = No;
}
v = rb_hash_lookup(h, strip_namespace_sym);
if (Qfalse == v) {
*options.strip_ns = '\0';
} else if (Qtrue == v) {
*options.strip_ns = '*';
options.strip_ns[1] = '\0';
} else if (Qnil != v) {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(options.strip_ns) - 1 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":strip_namespace can be no longer than %d characters.",
(int)sizeof(options.strip_ns) - 1);
}
strncpy(options.strip_ns, StringValuePtr(v), sizeof(options.strip_ns) - 1);
options.strip_ns[sizeof(options.strip_ns) - 1] = '\0';
}
v = rb_hash_lookup(h, margin_sym);
if (Qnil != v) {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(options.margin) - 1 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":margin can be no longer than %d characters.",
(int)sizeof(options.margin) - 1);
}
strncpy(options.margin, StringValuePtr(v), sizeof(options.margin) - 1);
options.margin[sizeof(options.margin) - 1] = '\0';
options.margin_len = strlen(options.margin);
}
}
#if HAS_ENCODING_SUPPORT
if ('\0' == *options.encoding) {
if (Qnil != encoding) {
options.rb_enc = rb_enc_from_index(rb_enc_get_index(encoding));
} else {
options.rb_enc = 0;
}
} else if (0 == options.rb_enc) {
options.rb_enc = rb_enc_find(options.encoding);
}
#elif HAS_PRIVATE_ENCODING
if ('\0' == *options.encoding) {
if (Qnil != encoding) {
options.rb_enc = encoding;
} else {
options.rb_enc = Qnil;
}
} else if (0 == options.rb_enc) {
options.rb_enc = rb_str_new2(options.encoding);
rb_gc_register_address(&options.rb_enc);
}
#endif
xml = defuse_bom(xml, &options);
switch (options.mode) {
case ObjMode:
#if HAS_GC_GUARD
rb_gc_disable();
#endif
obj = ox_parse(xml, ox_obj_callbacks, 0, &options, err);
#if HAS_GC_GUARD
RB_GC_GUARD(obj);
rb_gc_enable();
#endif
break;
case GenMode:
obj = ox_parse(xml, ox_gen_callbacks, 0, &options, err);
break;
case LimMode:
obj = ox_parse(xml, ox_limited_callbacks, 0, &options, err);
break;
case HashMode:
obj = ox_parse(xml, ox_hash_callbacks, 0, &options, err);
break;
case HashNoAttrMode:
obj = ox_parse(xml, ox_hash_no_attrs_callbacks, 0, &options, err);
break;
case NoMode:
obj = ox_parse(xml, ox_nomode_callbacks, 0, &options, err);
break;
default:
obj = ox_parse(xml, ox_gen_callbacks, 0, &options, err);
break;
}
return obj;
}
/* call-seq: load(xml, options) => Ox::Document or Ox::Element or Object
*
* Parses and XML document String into an Ox::Document, or Ox::Element, or
* Object depending on the options. Raises an exception if the XML is malformed
* or the classes specified are not valid. If a block is given it will be called
* on the completion of each complete top level entity with that entity as it's
* only argument.
*
* - +xml+ [String] XML String
* - +options+ [Hash] load options
* - *:mode* [:object|:generic|:limited] format expected
* - _:object_ - object format
* - _:generic_ - read as a generic XML file
* - _:limited_ - read as a generic XML file but with callbacks on text and elements events only
* - _:hash_ - read and convert to a Hash and core class objects only
* - _:hash_no_attrs_ - read and convert to a Hash and core class objects only without capturing attributes
* - *:effort* [:strict|:tolerant|:auto_define] effort to use when an undefined class is encountered, default: :strict
* - _:strict_ - raise an NameError for missing classes and modules
* - _:tolerant_ - return nil for missing classes and modules
* - _:auto_define_ - auto define missing classes and modules
* - *:trace* [Fixnum] trace level as a Fixnum, default: 0 (silent)
* - *:symbolize_keys* [true|false|nil] symbolize element attribute keys or leave as Strings
* - *:invalid_replace* [nil|String] replacement string for invalid XML characters on dump. nil indicates include anyway as hex. A string, limited to 10 characters will replace the invalid character with the replace.
* - *:strip_namespace* [String|true|false] "" or false result in no namespace stripping. A string of "*" or true will strip all namespaces. Any other non-empty string indicates that matching namespaces will be stripped.
*/
static VALUE
load_str(int argc, VALUE *argv, VALUE self) {
char *xml;
size_t len;
VALUE obj;
VALUE encoding;
struct _Err err;
err_init(&err);
Check_Type(*argv, T_STRING);
/* the xml string gets modified so make a copy of it */
len = RSTRING_LEN(*argv) + 1;
if (SMALL_XML < len) {
xml = ALLOC_N(char, len);
} else {
xml = ALLOCA_N(char, len);
}
#if HAS_ENCODING_SUPPORT
#ifdef MACRUBY_RUBY
encoding = rb_funcall(*argv, encoding_id, 0);
#else
encoding = rb_obj_encoding(*argv);
#endif
#elif HAS_PRIVATE_ENCODING
encoding = rb_funcall(*argv, encoding_id, 0);
#else
encoding = Qnil;
#endif
memcpy(xml, StringValuePtr(*argv), len);
obj = load(xml, argc - 1, argv + 1, self, encoding, &err);
if (SMALL_XML < len) {
xfree(xml);
}
if (err_has(&err)) {
ox_err_raise(&err);
}
return obj;
}
/* call-seq: load_file(file_path, options) => Ox::Document or Ox::Element or Object
*
* Parses and XML document from a file into an Ox::Document, or Ox::Element,
* or Object depending on the options. Raises an exception if the XML is
* malformed or the classes specified are not valid.
* - +file_path+ [String] file path to read the XML document from
* - +options+ [Hash] load options
* - *:mode* [:object|:generic|:limited] format expected
* - _:object_ - object format
* - _:generic_ - read as a generic XML file
* - _:limited_ - read as a generic XML file but with callbacks on text and elements events only
* - _:hash_ - read and convert to a Hash and core class objects only
* - _:hash_no_attrs_ - read and convert to a Hash and core class objects only without capturing attributes
* - *:effort* [:strict|:tolerant|:auto_define] effort to use when an undefined class is encountered, default: :strict
* - _:strict_ - raise an NameError for missing classes and modules
* - _:tolerant_ - return nil for missing classes and modules
* - _:auto_define_ - auto define missing classes and modules
* - *:trace* [Fixnum] trace level as a Fixnum, default: 0 (silent)
* - *:symbolize_keys* [true|false|nil] symbolize element attribute keys or leave as Strings
* - *:invalid_replace* [nil|String] replacement string for invalid XML characters on dump. nil indicates include anyway as hex. A string, limited to 10 characters will replace the invalid character with the replace.
* - *:strip_namespace* [String|true|false] "" or false result in no namespace stripping. A string of "*" or true will strip all namespaces. Any other non-empty string indicates that matching namespaces will be stripped.
*/
static VALUE
load_file(int argc, VALUE *argv, VALUE self) {
char *path;
char *xml;
FILE *f;
size_t len;
VALUE obj;
struct _Err err;
err_init(&err);
Check_Type(*argv, T_STRING);
path = StringValuePtr(*argv);
if (0 == (f = fopen(path, "r"))) {
rb_raise(rb_eIOError, "%s\n", strerror(errno));
}
fseek(f, 0, SEEK_END);
len = ftell(f);
if (SMALL_XML < len) {
xml = ALLOC_N(char, len + 1);
} else {
xml = ALLOCA_N(char, len + 1);
}
fseek(f, 0, SEEK_SET);
if (len != fread(xml, 1, len, f)) {
ox_err_set(&err, rb_eLoadError, "Failed to read %ld bytes from %s.\n", (long)len, path);
obj = Qnil;
} else {
xml[len] = '\0';
obj = load(xml, argc - 1, argv + 1, self, Qnil, &err);
}
fclose(f);
if (SMALL_XML < len) {
xfree(xml);
}
if (err_has(&err)) {
ox_err_raise(&err);
}
return obj;
}
/* call-seq: sax_parse(handler, io, options)
*
* Parses an IO stream or file containing an XML document. Raises an exception
* if the XML is malformed or the classes specified are not valid.
* - +handler+ [Ox::Sax] SAX (responds to OX::Sax methods) like handler
* - +io+ [IO|String] IO Object to read from
* - +options+ [Hash] options parse options
* - *:convert_special* [true|false] flag indicating special characters like < are converted
* - *:symbolize* [true|false] flag indicating the parser symbolize element and attribute names
* - *:smart* [true|false] flag indicating the parser uses hints if available (use with html)
* - *:skip* [:skip_none|:skip_return|:skip_white|:skip_off] flag indicating the parser skips \\r or collpase white space into a single space. Default (skip space)
* - *:strip_namespace* [nil|String|true|false] "" or false result in no namespace stripping. A string of "*" or true will strip all namespaces. Any other non-empty string indicates that matching namespaces will be stripped.
*/
static VALUE
sax_parse(int argc, VALUE *argv, VALUE self) {
struct _SaxOptions options;
options.symbolize = (No != ox_default_options.sym_keys);
options.convert_special = ox_default_options.convert_special;
options.smart = (Yes == ox_default_options.smart);
options.skip = ox_default_options.skip;
options.hints = NULL;
strcpy(options.strip_ns, ox_default_options.strip_ns);
if (argc < 2) {
rb_raise(ox_parse_error_class, "Wrong number of arguments to sax_parse.\n");
}
if (3 <= argc && rb_cHash == rb_obj_class(argv[2])) {
VALUE h = argv[2];
VALUE v;
if (Qnil != (v = rb_hash_lookup(h, convert_special_sym))) {
options.convert_special = (Qtrue == v);
}
if (Qnil != (v = rb_hash_lookup(h, smart_sym))) {
options.smart = (Qtrue == v);
}
if (Qnil != (v = rb_hash_lookup(h, symbolize_sym))) {
options.symbolize = (Qtrue == v);
}
if (Qnil != (v = rb_hash_lookup(h, skip_sym))) {
if (skip_return_sym == v) {
options.skip = CrSkip;
} else if (skip_white_sym == v) {
options.skip = SpcSkip;
} else if (skip_none_sym == v) {
options.skip = NoSkip;
} else if (skip_off_sym == v) {
options.skip = OffSkip;
}
}
if (Qnil != (v = rb_hash_lookup(h, strip_namespace_sym))) {
if (Qfalse == v) {
*options.strip_ns = '\0';
} else if (Qtrue == v) {
*options.strip_ns = '*';
options.strip_ns[1] = '\0';
} else {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(options.strip_ns) - 1 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":strip_namespace can be no longer than %d characters.",
(int)sizeof(options.strip_ns) - 1);
}
strncpy(options.strip_ns, StringValuePtr(v), sizeof(options.strip_ns) - 1);
options.strip_ns[sizeof(options.strip_ns) - 1] = '\0';
}
}
}
ox_sax_parse(argv[0], argv[1], &options);
return Qnil;
}
/* call-seq: sax_html(handler, io, options)
*
* Parses an IO stream or file containing an XML document. Raises an exception
* if the XML is malformed or the classes specified are not valid.
* - +handler+ [Ox::Sax] SAX (responds to OX::Sax methods) like handler
* - +io+ [IO|String] IO Object to read from
* - +options+ [Hash] options parse options
* - *:convert_special* [true|false] flag indicating special characters like < are converted
* - *:symbolize* [true|false] flag indicating the parser symbolize element and attribute names
* - *:skip* [:skip_none|:skip_return|:skip_white|:skip_off] flag indicating the parser skips \\r or collapse white space into a single space. Default (skip space)
* - *:overlay* [Hash] a Hash of keys that match html element names and values that are one of
* - _:active_ - make the normal callback for the element
* - _:nest_ok_ - active but ignore nest check
* - _:inactive_ - do not make the element start, end, or attribute callbacks for this element only
* - _:block_ - block this and all children callbacks
* - _:off_ - block this element and it's children unless the child element is active
* - _:abort_ - abort the html processing and return
*/
static VALUE
sax_html(int argc, VALUE *argv, VALUE self) {
struct _SaxOptions options;
bool free_hints = false;
options.symbolize = (No != ox_default_options.sym_keys);
options.convert_special = ox_default_options.convert_special;
options.smart = true;
options.skip = ox_default_options.skip;
options.hints = ox_default_options.html_hints;
if (NULL == options.hints) {
options.hints = ox_hints_html();
}
*options.strip_ns = '\0';
if (argc < 2) {
rb_raise(ox_parse_error_class, "Wrong number of arguments to sax_html.\n");
}
if (3 <= argc && rb_cHash == rb_obj_class(argv[2])) {
volatile VALUE h = argv[2];
volatile VALUE v;
if (Qnil != (v = rb_hash_lookup(h, convert_special_sym))) {
options.convert_special = (Qtrue == v);
}
if (Qnil != (v = rb_hash_lookup(h, symbolize_sym))) {
options.symbolize = (Qtrue == v);
}
if (Qnil != (v = rb_hash_lookup(h, skip_sym))) {
if (skip_return_sym == v) {
options.skip = CrSkip;
} else if (skip_white_sym == v) {
options.skip = SpcSkip;
} else if (skip_none_sym == v) {
options.skip = NoSkip;
} else if (skip_off_sym == v) {
options.skip = OffSkip;
}
}
if (Qnil != (v = rb_hash_lookup(h, overlay_sym))) {
int cnt;
Check_Type(v, T_HASH);
cnt = (int)RHASH_SIZE(v);
if (0 == cnt) {
options.hints = ox_hints_html();
} else {
options.hints = ox_hints_dup(options.hints);
free_hints = true;
rb_hash_foreach(v, set_overlay, (VALUE)options.hints);
}
}
}
ox_sax_parse(argv[0], argv[1], &options);
if (free_hints) {
ox_hints_destroy(options.hints);
}
return Qnil;
}
static void
parse_dump_options(VALUE ropts, Options copts) {
struct _YesNoOpt ynos[] = {
{ with_xml_sym, &copts->with_xml },
{ with_dtd_sym, &copts->with_dtd },
{ with_instruct_sym, &copts->with_instruct },
{ xsd_date_sym, &copts->xsd_date },
{ circular_sym, &copts->circular },
{ Qnil, 0 }
};
YesNoOpt o;
if (rb_cHash == rb_obj_class(ropts)) {
VALUE v;
if (Qnil != (v = rb_hash_lookup(ropts, ox_indent_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v) && T_FIXNUM != rb_type(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":indent must be a Fixnum.\n");
}
copts->indent = NUM2INT(v);
}
if (Qnil != (v = rb_hash_lookup(ropts, trace_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v) && T_FIXNUM != rb_type(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":trace must be a Fixnum.\n");
}
copts->trace = NUM2INT(v);
}
if (Qnil != (v = rb_hash_lookup(ropts, ox_encoding_sym))) {
if (rb_cString != rb_obj_class(v)) {
rb_raise(ox_parse_error_class, ":encoding must be a String.\n");
}
strncpy(copts->encoding, StringValuePtr(v), sizeof(copts->encoding) - 1);
}
if (Qnil != (v = rb_hash_lookup(ropts, effort_sym))) {
if (auto_define_sym == v) {
copts->effort = AutoEffort;
} else if (tolerant_sym == v) {
copts->effort = TolerantEffort;
} else if (strict_sym == v) {
copts->effort = StrictEffort;
} else {
rb_raise(ox_parse_error_class, ":effort must be :strict, :tolerant, or :auto_define.\n");
}
}
v = rb_hash_lookup(ropts, invalid_replace_sym);
if (Qnil == v) {
if (Qtrue == rb_funcall(ropts, has_key_id, 1, invalid_replace_sym)) {
copts->allow_invalid = Yes;
}
} else {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(copts->inv_repl) - 2 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":invalid_replace can be no longer than %d characters.",
(int)sizeof(copts->inv_repl) - 2);
}
strncpy(copts->inv_repl + 1, StringValuePtr(v), sizeof(copts->inv_repl) - 1);
copts->inv_repl[sizeof(copts->inv_repl) - 1] = '\0';
*copts->inv_repl = (char)slen;
copts->allow_invalid = No;
}
v = rb_hash_lookup(ropts, margin_sym);
if (Qnil != v) {
long slen;
Check_Type(v, T_STRING);
slen = RSTRING_LEN(v);
if (sizeof(copts->margin) - 2 < (size_t)slen) {
rb_raise(ox_parse_error_class, ":margin can be no longer than %d characters.",
(int)sizeof(copts->margin) - 2);
}
strncpy(copts->margin, StringValuePtr(v), sizeof(copts->margin) - 1);
copts->margin[sizeof(copts->margin) - 1] = '\0';
copts->margin_len = (char)slen;
}
for (o = ynos; 0 != o->attr; o++) {
if (Qnil != (v = rb_hash_lookup(ropts, o->sym))) {
VALUE c = rb_obj_class(v);
if (rb_cTrueClass == c) {
*o->attr = Yes;
} else if (rb_cFalseClass == c) {
*o->attr = No;
} else {
rb_raise(ox_parse_error_class, "%s must be true or false.\n", rb_id2name(SYM2ID(o->sym)));
}
}
}
}
}
/* call-seq: dump(obj, options) => xml-string
*
* Dumps an Object (obj) to a string.
* - +obj+ [Object] Object to serialize as an XML document String
* - +options+ [Hash] formating options
* - *:indent* [Fixnum] format expected
* - *:xsd_date* [true|false] use XSD date format if true, default: false
* - *:circular* [true|false] allow circular references, default: false
* - *:strict|:tolerant]* [ :effort effort to use when an undumpable object (e.g., IO) is encountered, default: :strict
* - _:strict_ - raise an NotImplementedError if an undumpable object is encountered
* - _:tolerant_ - replaces undumplable objects with nil
*
* Note that an indent of less than zero will result in a tight one line output
* unless the text in the XML fields contain new line characters.
*/
static VALUE
dump(int argc, VALUE *argv, VALUE self) {
char *xml;
struct _Options copts = ox_default_options;
VALUE rstr;
if (2 == argc) {
parse_dump_options(argv[1], &copts);
}
if (0 == (xml = ox_write_obj_to_str(*argv, &copts))) {
rb_raise(rb_eNoMemError, "Not enough memory.\n");
}
rstr = rb_str_new2(xml);
#if HAS_ENCODING_SUPPORT
if ('\0' != *copts.encoding) {
rb_enc_associate(rstr, rb_enc_find(copts.encoding));
}
#elif HAS_PRIVATE_ENCODING
if ('\0' != *copts.encoding) {
rb_funcall(rstr, ox_force_encoding_id, 1, rb_str_new2(copts.encoding));
}
#endif
xfree(xml);
return rstr;
}
/* call-seq: to_file(file_path, obj, options)
*
* Dumps an Object to the specified file.
* - +file_path+ [String] file path to write the XML document to
* - +obj+ [Object] Object to serialize as an XML document String
* - +options+ [Hash] formating options
* - *:indent* [Fixnum] format expected
* - *:xsd_date* [true|false] use XSD date format if true, default: false
* - *:circular* [true|false] allow circular references, default: false
* - *:strict|:tolerant]* [ :effort effort to use when an undumpable object (e.g., IO) is encountered, default: :strict
* - _:strict_ - raise an NotImplementedError if an undumpable object is encountered
* - _:tolerant_ - replaces undumplable objects with nil
*
* Note that an indent of less than zero will result in a tight one line output
* unless the text in the XML fields contain new line characters.
*/
static VALUE
to_file(int argc, VALUE *argv, VALUE self) {
struct _Options copts = ox_default_options;
if (3 == argc) {
parse_dump_options(argv[2], &copts);
}
Check_Type(*argv, T_STRING);
ox_write_obj_to_file(argv[1], StringValuePtr(*argv), &copts);
return Qnil;
}
#if WITH_CACHE_TESTS
extern void ox_cache_test(void);
static VALUE
cache_test(VALUE self) {
ox_cache_test();
return Qnil;
}
extern void ox_cache8_test(void);
static VALUE
cache8_test(VALUE self) {
ox_cache8_test();
return Qnil;
}
#endif
void Init_ox() {
Ox = rb_define_module("Ox");
rb_define_module_function(Ox, "default_options", get_def_opts, 0);
rb_define_module_function(Ox, "default_options=", set_def_opts, 1);
rb_define_module_function(Ox, "parse_obj", to_obj, 1);
rb_define_module_function(Ox, "parse", to_gen, 1);
rb_define_module_function(Ox, "load", load_str, -1);
rb_define_module_function(Ox, "sax_parse", sax_parse, -1);
rb_define_module_function(Ox, "sax_html", sax_html, -1);
rb_define_module_function(Ox, "to_xml", dump, -1);
rb_define_module_function(Ox, "dump", dump, -1);
rb_define_module_function(Ox, "load_file", load_file, -1);
rb_define_module_function(Ox, "to_file", to_file, -1);
rb_define_module_function(Ox, "sax_html_overlay", sax_html_overlay, 0);
ox_init_builder(Ox);
rb_require("time");
rb_require("date");
rb_require("bigdecimal");
rb_require("stringio");
ox_abort_id = rb_intern("abort");
ox_at_column_id = rb_intern("@column");
ox_at_content_id = rb_intern("@content");
ox_at_id = rb_intern("at");
ox_at_line_id = rb_intern("@line");
ox_at_pos_id = rb_intern("@pos");
ox_at_value_id = rb_intern("@value");
ox_attr_id = rb_intern("attr");
ox_attr_value_id = rb_intern("attr_value");
ox_attributes_id = rb_intern("@attributes");
ox_attrs_done_id = rb_intern("attrs_done");
ox_beg_id = rb_intern("@beg");
ox_cdata_id = rb_intern("cdata");
ox_comment_id = rb_intern("comment");
ox_den_id = rb_intern("@den");
ox_doctype_id = rb_intern("doctype");
ox_end_element_id = rb_intern("end_element");
ox_end_id = rb_intern("@end");
ox_end_instruct_id = rb_intern("end_instruct");
ox_error_id = rb_intern("error");
ox_excl_id = rb_intern("@excl");
ox_external_encoding_id = rb_intern("external_encoding");
ox_fileno_id = rb_intern("fileno");
ox_force_encoding_id = rb_intern("force_encoding");
ox_inspect_id = rb_intern("inspect");
ox_instruct_id = rb_intern("instruct");
ox_jd_id = rb_intern("jd");
ox_keys_id = rb_intern("keys");
ox_local_id = rb_intern("local");
ox_mesg_id = rb_intern("mesg");
ox_message_id = rb_intern("message");
ox_nodes_id = rb_intern("@nodes");
ox_new_id = rb_intern("new");
ox_num_id = rb_intern("@num");
ox_parse_id = rb_intern("parse");
ox_pos_id = rb_intern("pos");
ox_read_id = rb_intern("read");
ox_readpartial_id = rb_intern("readpartial");
ox_start_element_id = rb_intern("start_element");
ox_string_id = rb_intern("string");
ox_text_id = rb_intern("text");
ox_to_c_id = rb_intern("to_c");
ox_to_s_id = rb_intern("to_s");
ox_to_sym_id = rb_intern("to_sym");
ox_tv_nsec_id = rb_intern("tv_nsec");
ox_tv_sec_id = rb_intern("tv_sec");
ox_tv_usec_id = rb_intern("tv_usec");
ox_value_id = rb_intern("value");
encoding_id = rb_intern("encoding");
has_key_id = rb_intern("has_key?");
rb_require("ox/version");
rb_require("ox/error");
rb_require("ox/hasattrs");
rb_require("ox/node");
rb_require("ox/comment");
rb_require("ox/instruct");
rb_require("ox/cdata");
rb_require("ox/doctype");
rb_require("ox/element");
rb_require("ox/document");
rb_require("ox/bag");
rb_require("ox/sax");
ox_time_class = rb_const_get(rb_cObject, rb_intern("Time"));
ox_date_class = rb_const_get(rb_cObject, rb_intern("Date"));
ox_parse_error_class = rb_const_get_at(Ox, rb_intern("ParseError"));
ox_arg_error_class = rb_const_get_at(Ox, rb_intern("ArgError"));
ox_struct_class = rb_const_get(rb_cObject, rb_intern("Struct"));
ox_stringio_class = rb_const_get(rb_cObject, rb_intern("StringIO"));
ox_bigdecimal_class = rb_const_get(rb_cObject, rb_intern("BigDecimal"));
abort_sym = ID2SYM(rb_intern("abort")); rb_gc_register_address(&abort_sym);
active_sym = ID2SYM(rb_intern("active")); rb_gc_register_address(&active_sym);
auto_define_sym = ID2SYM(rb_intern("auto_define")); rb_gc_register_address(&auto_define_sym);
auto_sym = ID2SYM(rb_intern("auto")); rb_gc_register_address(&auto_sym);
block_sym = ID2SYM(rb_intern("block")); rb_gc_register_address(&block_sym);
circular_sym = ID2SYM(rb_intern("circular")); rb_gc_register_address(&circular_sym);
convert_special_sym = ID2SYM(rb_intern("convert_special")); rb_gc_register_address(&convert_special_sym);
effort_sym = ID2SYM(rb_intern("effort")); rb_gc_register_address(&effort_sym);
generic_sym = ID2SYM(rb_intern("generic")); rb_gc_register_address(&generic_sym);
hash_no_attrs_sym = ID2SYM(rb_intern("hash_no_attrs")); rb_gc_register_address(&hash_no_attrs_sym);
hash_sym = ID2SYM(rb_intern("hash")); rb_gc_register_address(&hash_sym);
inactive_sym = ID2SYM(rb_intern("inactive")); rb_gc_register_address(&inactive_sym);
invalid_replace_sym = ID2SYM(rb_intern("invalid_replace")); rb_gc_register_address(&invalid_replace_sym);
limited_sym = ID2SYM(rb_intern("limited")); rb_gc_register_address(&limited_sym);
margin_sym = ID2SYM(rb_intern("margin")); rb_gc_register_address(&margin_sym);
mode_sym = ID2SYM(rb_intern("mode")); rb_gc_register_address(&mode_sym);
nest_ok_sym = ID2SYM(rb_intern("nest_ok")); rb_gc_register_address(&nest_ok_sym);
object_sym = ID2SYM(rb_intern("object")); rb_gc_register_address(&object_sym);
off_sym = ID2SYM(rb_intern("off")); rb_gc_register_address(&off_sym);
opt_format_sym = ID2SYM(rb_intern("opt_format")); rb_gc_register_address(&opt_format_sym);
optimized_sym = ID2SYM(rb_intern("optimized")); rb_gc_register_address(&optimized_sym);
overlay_sym = ID2SYM(rb_intern("overlay")); rb_gc_register_address(&overlay_sym);
ox_encoding_sym = ID2SYM(rb_intern("encoding")); rb_gc_register_address(&ox_encoding_sym);
ox_indent_sym = ID2SYM(rb_intern("indent")); rb_gc_register_address(&ox_indent_sym);
ox_size_sym = ID2SYM(rb_intern("size")); rb_gc_register_address(&ox_size_sym);
ox_standalone_sym = ID2SYM(rb_intern("standalone")); rb_gc_register_address(&ox_standalone_sym);
ox_version_sym = ID2SYM(rb_intern("version")); rb_gc_register_address(&ox_version_sym);
skip_none_sym = ID2SYM(rb_intern("skip_none")); rb_gc_register_address(&skip_none_sym);
skip_off_sym = ID2SYM(rb_intern("skip_off")); rb_gc_register_address(&skip_off_sym);
skip_return_sym = ID2SYM(rb_intern("skip_return")); rb_gc_register_address(&skip_return_sym);
skip_sym = ID2SYM(rb_intern("skip")); rb_gc_register_address(&skip_sym);
skip_white_sym = ID2SYM(rb_intern("skip_white")); rb_gc_register_address(&skip_white_sym);
smart_sym = ID2SYM(rb_intern("smart")); rb_gc_register_address(&smart_sym);
strict_sym = ID2SYM(rb_intern("strict")); rb_gc_register_address(&strict_sym);
strip_namespace_sym = ID2SYM(rb_intern("strip_namespace")); rb_gc_register_address(&strip_namespace_sym);
symbolize_keys_sym = ID2SYM(rb_intern("symbolize_keys")); rb_gc_register_address(&symbolize_keys_sym);
symbolize_sym = ID2SYM(rb_intern("symbolize")); rb_gc_register_address(&symbolize_sym);
tolerant_sym = ID2SYM(rb_intern("tolerant")); rb_gc_register_address(&tolerant_sym);
trace_sym = ID2SYM(rb_intern("trace")); rb_gc_register_address(&trace_sym);
with_dtd_sym = ID2SYM(rb_intern("with_dtd")); rb_gc_register_address(&with_dtd_sym);
with_instruct_sym = ID2SYM(rb_intern("with_instructions")); rb_gc_register_address(&with_instruct_sym);
with_xml_sym = ID2SYM(rb_intern("with_xml")); rb_gc_register_address(&with_xml_sym);
xsd_date_sym = ID2SYM(rb_intern("xsd_date")); rb_gc_register_address(&xsd_date_sym);
ox_empty_string = rb_str_new2(""); rb_gc_register_address(&ox_empty_string);
ox_zero_fixnum = INT2NUM(0); rb_gc_register_address(&ox_zero_fixnum);
ox_sym_bank = rb_ary_new(); rb_gc_register_address(&ox_sym_bank);
ox_document_clas = rb_const_get_at(Ox, rb_intern("Document"));
ox_element_clas = rb_const_get_at(Ox, rb_intern("Element"));
ox_instruct_clas = rb_const_get_at(Ox, rb_intern("Instruct"));
ox_comment_clas = rb_const_get_at(Ox, rb_intern("Comment"));
ox_raw_clas = rb_const_get_at(Ox, rb_intern("Raw"));
ox_doctype_clas = rb_const_get_at(Ox, rb_intern("DocType"));
ox_cdata_clas = rb_const_get_at(Ox, rb_intern("CData"));
ox_bag_clas = rb_const_get_at(Ox, rb_intern("Bag"));
ox_cache_new(&ox_symbol_cache);
ox_cache_new(&ox_class_cache);
ox_cache_new(&ox_attr_cache);
ox_sax_define();
#if WITH_CACHE_TESTS
// space added to stop yardoc from trying to document
rb_define _module_function(Ox, "cache_test", cache_test, 0);
rb_define _module_function(Ox, "cache8_test", cache8_test, 0);
#endif
#if HAS_ENCODING_SUPPORT
ox_utf8_encoding = rb_enc_find("UTF-8");
#elif HAS_PRIVATE_ENCODING
ox_utf8_encoding = rb_str_new2("UTF-8");
rb_gc_register_address(&ox_utf8_encoding);
#endif
}
#if __GNUC__ > 4
_Noreturn void
#else
void
#endif
_ox_raise_error(const char *msg, const char *xml, const char *current, const char* file, int line) {
int xline = 1;
int col = 1;
for (; xml < current && '\n' != *current; current--) {
col++;
}
for (; xml < current; current--) {
if ('\n' == *current) {
xline++;
}
}
#if HAS_GC_GUARD
rb_gc_enable();
#endif
rb_raise(ox_parse_error_class, "%s at line %d, column %d [%s:%d]\n", msg, xline, col, file, line);
}
ox-2.8.2/ext/ox/sax_stack.h 0000644 0000041 0000041 00000003456 13203413063 015563 0 ustar www-data www-data /* sax_stack.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_SAX_STACK_H__
#define __OX_SAX_STACK_H__
#include "sax_hint.h"
#define STACK_INC 32
typedef struct _Nv {
const char *name;
VALUE val;
int childCnt;
Hint hint;
} *Nv;
typedef struct _NStack {
struct _Nv base[STACK_INC];
Nv head; /* current stack */
Nv end; /* stack end */
Nv tail; /* pointer to one past last element name on stack */
} *NStack;
inline static void
stack_init(NStack stack) {
stack->head = stack->base;
stack->end = stack->base + sizeof(stack->base) / sizeof(struct _Nv);
stack->tail = stack->head;
}
inline static int
stack_empty(NStack stack) {
return (stack->head == stack->tail);
}
inline static void
stack_cleanup(NStack stack) {
if (stack->base != stack->head) {
xfree(stack->head);
}
}
inline static void
stack_push(NStack stack, const char *name, VALUE val, Hint hint) {
if (stack->end <= stack->tail) {
size_t len = stack->end - stack->head;
size_t toff = stack->tail - stack->head;
if (stack->base == stack->head) {
stack->head = ALLOC_N(struct _Nv, len + STACK_INC);
memcpy(stack->head, stack->base, sizeof(struct _Nv) * len);
} else {
REALLOC_N(stack->head, struct _Nv, len + STACK_INC);
}
stack->tail = stack->head + toff;
stack->end = stack->head + len + STACK_INC;
}
stack->tail->name = name;
stack->tail->val = val;
stack->tail->hint = hint;
stack->tail->childCnt = 0;
stack->tail++;
}
inline static Nv
stack_peek(NStack stack) {
if (stack->head < stack->tail) {
return stack->tail - 1;
}
return 0;
}
inline static Nv
stack_pop(NStack stack) {
if (stack->head < stack->tail) {
stack->tail--;
return stack->tail;
}
return 0;
}
#endif /* __OX_SAX_STACK_H__ */
ox-2.8.2/ext/ox/sax_has.h 0000644 0000041 0000041 00000003177 13203413063 015231 0 ustar www-data www-data /* sax_has.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_SAX_HAS_H__
#define __OX_SAX_HAS_H__
typedef struct _Has {
int instruct;
int end_instruct;
int attr;
int attrs_done;
int attr_value;
int doctype;
int comment;
int cdata;
int text;
int value;
int start_element;
int end_element;
int error;
int pos;
int line;
int column;
} *Has;
inline static int
respond_to(VALUE obj, ID method) {
return rb_respond_to(obj, method);
}
inline static void
has_init(Has has, VALUE handler) {
has->instruct = respond_to(handler, ox_instruct_id);
has->end_instruct = respond_to(handler, ox_end_instruct_id);
has->attr = respond_to(handler, ox_attr_id);
has->attr_value = respond_to(handler, ox_attr_value_id);
has->attrs_done = respond_to(handler, ox_attrs_done_id);
has->doctype = respond_to(handler, ox_doctype_id);
has->comment = respond_to(handler, ox_comment_id);
has->cdata = respond_to(handler, ox_cdata_id);
has->text = respond_to(handler, ox_text_id);
has->value = respond_to(handler, ox_value_id);
has->start_element = respond_to(handler, ox_start_element_id);
has->end_element = respond_to(handler, ox_end_element_id);
has->error = respond_to(handler, ox_error_id);
has->pos = (Qtrue == rb_ivar_defined(handler, ox_at_pos_id));
has->line = (Qtrue == rb_ivar_defined(handler, ox_at_line_id));
has->column = (Qtrue == rb_ivar_defined(handler, ox_at_column_id));
}
#endif /* __OX_SAX_HAS_H__ */
ox-2.8.2/ext/ox/special.h 0000644 0000041 0000041 00000000357 13203413063 015220 0 ustar www-data www-data /* special.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_SPECIAL_H__
#define __OX_SPECIAL_H__
#include
extern char* ox_ucs_to_utf8_chars(char *text, uint64_t u);
#endif /* __OX_SPECIAL_H__ */
ox-2.8.2/ext/ox/err.h 0000644 0000041 0000041 00000001426 13203413063 014366 0 ustar www-data www-data /* err.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_ERR_H__
#define __OX_ERR_H__
#include "ruby.h"
#define set_error(err, msg, xml, current) _ox_err_set_with_location(err, msg, xml, current, __FILE__, __LINE__)
typedef struct _Err {
VALUE clas;
char msg[128];
} *Err;
extern VALUE ox_arg_error_class;
extern VALUE ox_parse_error_class;
extern void ox_err_set(Err e, VALUE clas, const char *format, ...);
extern void _ox_err_set_with_location(Err err, const char *msg, const char *xml, const char *current, const char* file, int line);
extern void ox_err_raise(Err e);
inline static void
err_init(Err e) {
e->clas = Qnil;
*e->msg = '\0';
}
inline static int
err_has(Err e) {
return (Qnil != e->clas);
}
#endif /* __OX_ERR_H__ */
ox-2.8.2/ext/ox/err.c 0000644 0000041 0000041 00000001531 13203413063 014356 0 ustar www-data www-data /* err.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include "err.h"
void
ox_err_set(Err e, VALUE clas, const char *format, ...) {
va_list ap;
va_start(ap, format);
e->clas = clas;
vsnprintf(e->msg, sizeof(e->msg) - 1, format, ap);
va_end(ap);
}
#if __GNUC__ > 4
_Noreturn void
#else
void
#endif
ox_err_raise(Err e) {
rb_raise(e->clas, "%s", e->msg);
}
void
_ox_err_set_with_location(Err err, const char *msg, const char *xml, const char *current, const char* file, int line) {
int xline = 1;
int col = 1;
for (; xml < current && '\n' != *current; current--) {
col++;
}
for (; xml < current; current--) {
if ('\n' == *current) {
xline++;
}
}
ox_err_set(err, ox_parse_error_class, "%s at line %d, column %d [%s:%d]\n", msg, xline, col, file, line);
}
ox-2.8.2/ext/ox/cache8.c 0000644 0000041 0000041 00000004124 13203413063 014722 0 ustar www-data www-data /* cache8.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#include "ruby.h"
#include "cache8.h"
#define BITS 4
#define MASK 0x000000000000000FULL
#define SLOT_CNT 16
#define DEPTH 16
typedef union {
struct _Cache8 *child;
slot_t value;
} Bucket;
struct _Cache8 {
Bucket buckets[SLOT_CNT];
};
static void cache8_delete(Cache8 cache, int depth);
//static void slot_print(Cache8 cache, sid_t key, unsigned int depth);
void
ox_cache8_new(Cache8 *cache) {
Bucket *b;
int i;
*cache = ALLOC(struct _Cache8);
for (i = SLOT_CNT, b = (*cache)->buckets; 0 < i; i--, b++) {
b->value = 0;
}
}
void
ox_cache8_delete(Cache8 cache) {
cache8_delete(cache, 0);
}
static void
cache8_delete(Cache8 cache, int depth) {
Bucket *b;
unsigned int i;
for (i = 0, b = cache->buckets; i < SLOT_CNT; i++, b++) {
if (0 != b->child) {
if (DEPTH - 1 != depth) {
cache8_delete(b->child, depth + 1);
}
}
}
xfree(cache);
}
slot_t
ox_cache8_get(Cache8 cache, sid_t key, slot_t **slot) {
Bucket *b;
int i;
sid_t k8 = (sid_t)key;
sid_t k;
for (i = 64 - BITS; 0 < i; i -= BITS) {
k = (k8 >> i) & MASK;
b = cache->buckets + k;
if (0 == b->child) {
ox_cache8_new(&b->child);
}
cache = b->child;
}
*slot = &(cache->buckets + (k8 & MASK))->value;
return **slot;
}
#if 0
void
ox_cache8_print(Cache8 cache) {
//printf("-------------------------------------------\n");
slot_print(cache, 0, 0);
}
static void
slot_print(Cache8 c, sid_t key, unsigned int depth) {
Bucket *b;
unsigned int i;
sid_t k8 = (sid_t)key;
sid_t k;
for (i = 0, b = c->buckets; i < SLOT_CNT; i++, b++) {
if (0 != b->child) {
k = (k8 << BITS) | i;
//printf("*** key: 0x%016llx depth: %u i: %u\n", k, depth, i);
if (DEPTH - 1 == depth) {
printf("0x%016llx: %4llu\n", (unsigned long long)k, (unsigned long long)b->value);
} else {
slot_print(b->child, k, depth + 1);
}
}
}
}
#endif
ox-2.8.2/ext/ox/cache.h 0000644 0000041 0000041 00000000603 13203413063 014635 0 ustar www-data www-data /* cache.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_CACHE_H__
#define __OX_CACHE_H__
#include "ruby.h"
typedef struct _Cache *Cache;
extern void ox_cache_new(Cache *cache);
extern VALUE ox_cache_get(Cache cache, const char *key, VALUE **slot, const char **keyp);
extern void ox_cache_print(Cache cache);
#endif /* __OX_CACHE_H__ */
ox-2.8.2/ext/ox/builder.c 0000644 0000041 0000041 00000053317 13203413063 015225 0 ustar www-data www-data /* builder.c
* Copyright (c) 2011, 2016 Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include "ox.h"
#include "buf.h"
#include "err.h"
#define MAX_DEPTH 128
typedef struct _Element {
char *name;
char buf[64];
int len;
bool has_child;
bool non_text_child;
} *Element;
typedef struct _Builder {
struct _Buf buf;
int indent;
char encoding[64];
int depth;
FILE *file;
struct _Element stack[MAX_DEPTH];
long line;
long col;
long pos;
} *Builder;
static VALUE builder_class = Qundef;
static const char indent_spaces[] = "\n "; // 128 spaces
// The : character is equivalent to 10. Used for replacement characters up to
// 10 characters long such as ''. From
// https://www.w3.org/TR/2006/REC-xml11-20060816
#if 0
static const char xml_friendly_chars[257] = "\
:::::::::11::1::::::::::::::::::\
11611156111111111111111111114141\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111";
#endif
// From 2.3 of the XML 1.1 spec. All over 0x20 except <&", > also. Builder
// uses double quotes for attributes.
static const char xml_attr_chars[257] = "\
:::::::::11::1::::::::::::::::::\
11611151111111111111111111114141\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111";
// From 3.1 of the XML 1.1 spec. All over 0x20 except <&, > also.
static const char xml_element_chars[257] = "\
:::::::::11::1::::::::::::::::::\
11111151111111111111111111114141\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111\
11111111111111111111111111111111";
inline static size_t
xml_str_len(const unsigned char *str, size_t len, const char *table) {
size_t size = 0;
for (; 0 < len; str++, len--) {
size += table[*str];
}
return size - len * (size_t)'0';
}
static void
append_indent(Builder b) {
if (0 >= b->indent) {
return;
}
if (b->buf.head < b->buf.tail) {
int cnt = (b->indent * (b->depth + 1)) + 1;
if (sizeof(indent_spaces) <= (size_t)cnt) {
cnt = sizeof(indent_spaces) - 1;
}
buf_append_string(&b->buf, indent_spaces, cnt);
b->line++;
b->col = cnt - 1;
b->pos += cnt;
}
}
static void
append_string(Builder b, const char *str, size_t size, const char *table, bool strip_invalid_chars) {
size_t xsize = xml_str_len((const unsigned char*)str, size, table);
if (size == xsize) {
const char *s = str;
const char *end = str + size;
buf_append_string(&b->buf, str, size);
b->col += size;
s = strchr(s, '\n');
while (NULL != s) {
b->line++;
b->col = end - s;
s = strchr(s + 1, '\n');
}
b->pos += size;
} else {
char buf[256];
char *end = buf + sizeof(buf) - 1;
char *bp = buf;
int i = size;
int fcnt;
for (; '\0' != *str && 0 < i; i--, str++) {
if ('1' == (fcnt = table[(unsigned char)*str])) {
if (end <= bp) {
buf_append_string(&b->buf, buf, bp - buf);
bp = buf;
}
if ('\n' == *str) {
b->line++;
b->col = 1;
} else {
b->col++;
}
b->pos++;
*bp++ = *str;
} else {
b->pos += fcnt - '0';
b->col += fcnt - '0';
if (buf < bp) {
buf_append_string(&b->buf, buf, bp - buf);
bp = buf;
}
switch (*str) {
case '"':
buf_append_string(&b->buf, """, 6);
break;
case '&':
buf_append_string(&b->buf, "&", 5);
break;
case '\'':
buf_append_string(&b->buf, "'", 6);
break;
case '<':
buf_append_string(&b->buf, "<", 4);
break;
case '>':
buf_append_string(&b->buf, ">", 4);
break;
default:
// Must be one of the invalid characters.
if (!strip_invalid_chars) {
rb_raise(rb_eSyntaxError, "'\\#x%02x' is not a valid XML character.", *str);
}
break;
}
}
}
if (buf < bp) {
buf_append_string(&b->buf, buf, bp - buf);
bp = buf;
}
}
}
static void
append_sym_str(Builder b, VALUE v) {
const char *s;
int len;
switch (rb_type(v)) {
case T_STRING:
s = StringValuePtr(v);
len = RSTRING_LEN(v);
break;
case T_SYMBOL:
s = rb_id2name(SYM2ID(v));
len = strlen(s);
break;
default:
rb_raise(ox_arg_error_class, "expected a Symbol or String");
break;
}
append_string(b, s, len, xml_element_chars, false);
}
static void
i_am_a_child(Builder b, bool is_text) {
if (0 <= b->depth) {
Element e = &b->stack[b->depth];
if (!e->has_child) {
e->has_child = true;
buf_append(&b->buf, '>');
b->col++;
b->pos++;
}
if (!is_text) {
e->non_text_child = true;
}
}
}
static int
append_attr(VALUE key, VALUE value, Builder b) {
buf_append(&b->buf, ' ');
b->col++;
b->pos++;
append_sym_str(b, key);
buf_append_string(&b->buf, "=\"", 2);
b->col += 2;
b->pos += 2;
Check_Type(value, T_STRING);
append_string(b, StringValuePtr(value), (int)RSTRING_LEN(value), xml_attr_chars, false);
buf_append(&b->buf, '"');
b->col++;
b->pos++;
return ST_CONTINUE;
}
static void
init(Builder b, int fd, int indent, long initial_size) {
buf_init(&b->buf, fd, initial_size);
b->indent = indent;
*b->encoding = '\0';
b->depth = -1;
b->line = 1;
b->col = 1;
b->pos = 0;
}
static void
builder_free(void *ptr) {
Builder b;
Element e;
int d;
if (0 == ptr) {
return;
}
b = (Builder)ptr;
buf_cleanup(&b->buf);
for (e = b->stack, d = b->depth; 0 < d; d--, e++) {
if (e->name != e->buf) {
free(e->name);
}
}
xfree(ptr);
}
static void
pop(Builder b) {
Element e;
if (0 > b->depth) {
rb_raise(ox_arg_error_class, "closed too many elements");
}
e = &b->stack[b->depth];
b->depth--;
if (e->has_child) {
if (e->non_text_child) {
append_indent(b);
}
buf_append_string(&b->buf, "", 2);
buf_append_string(&b->buf, e->name, e->len);
buf_append(&b->buf, '>');
b->col += e->len + 3;
b->pos += e->len + 3;
if (e->buf != e->name) {
free(e->name);
e->name = 0;
}
} else {
buf_append_string(&b->buf, "/>", 2);
b->col += 2;
b->pos += 2;
}
}
static void
bclose(Builder b) {
while (0 <= b->depth) {
pop(b);
}
if (0 <= b->indent) {
buf_append(&b->buf, '\n');
}
b->line++;
b->col = 1;
b->pos++;
buf_finish(&b->buf);
if (NULL != b->file) {
fclose(b->file);
}
}
static VALUE
to_s(Builder b) {
volatile VALUE rstr;
if (0 != b->buf.fd) {
rb_raise(ox_arg_error_class, "can not create a String with a stream or file builder.");
}
if (0 <= b->indent && '\n' != *(b->buf.tail - 1)) {
buf_append(&b->buf, '\n');
b->line++;
b->col = 1;
b->pos++;
}
*b->buf.tail = '\0'; // for debugging
rstr = rb_str_new(b->buf.head, buf_len(&b->buf));
if ('\0' != *b->encoding) {
#if HAS_ENCODING_SUPPORT
rb_enc_associate(rstr, rb_enc_find(b->encoding));
#endif
}
return rstr;
}
/* call-seq: new(options)
*
* Creates a new Builder that will write to a string that can be retrieved with
* the to_s() method. If a block is given it is executed with a single parameter
* which is the builder instance. The return value is then the generated string.
*
* - +options+ - (Hash) formating options
* - +:indent+ (Fixnum) indentaion level, negative values excludes terminating newline
* - +:size+ (Fixnum) the initial size of the string buffer
*/
static VALUE
builder_new(int argc, VALUE *argv, VALUE self) {
Builder b = ALLOC(struct _Builder);
int indent = ox_default_options.indent;
long buf_size = 0;
if (1 == argc) {
volatile VALUE v;
rb_check_type(*argv, T_HASH);
if (Qnil != (v = rb_hash_lookup(*argv, ox_indent_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":indent must be a fixnum.\n");
}
indent = NUM2INT(v);
}
if (Qnil != (v = rb_hash_lookup(*argv, ox_size_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":size must be a fixnum.\n");
}
buf_size = NUM2LONG(v);
}
}
b->file = NULL;
init(b, 0, indent, buf_size);
if (rb_block_given_p()) {
volatile VALUE rb = Data_Wrap_Struct(builder_class, NULL, builder_free, b);
rb_yield(rb);
bclose(b);
return to_s(b);
} else {
return Data_Wrap_Struct(builder_class, NULL, builder_free, b);
}
}
/* call-seq: file(filename, options)
*
* Creates a new Builder that will write to a file.
*
* - +filename+ (String) filename to write to
* - +options+ - (Hash) formating options
* - +:indent+ (Fixnum) indentaion level, negative values excludes terminating newline
* - +:size+ (Fixnum) the initial size of the string buffer
*/
static VALUE
builder_file(int argc, VALUE *argv, VALUE self) {
Builder b = ALLOC(struct _Builder);
int indent = ox_default_options.indent;
long buf_size = 0;
FILE *f;
if (1 > argc) {
rb_raise(ox_arg_error_class, "missing filename");
}
Check_Type(*argv, T_STRING);
if (NULL == (f = fopen(StringValuePtr(*argv), "w"))) {
xfree(b);
rb_raise(rb_eIOError, "%s\n", strerror(errno));
}
if (2 == argc) {
volatile VALUE v;
rb_check_type(argv[1], T_HASH);
if (Qnil != (v = rb_hash_lookup(argv[1], ox_indent_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":indent must be a fixnum.\n");
}
indent = NUM2INT(v);
}
if (Qnil != (v = rb_hash_lookup(argv[1], ox_size_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":size must be a fixnum.\n");
}
buf_size = NUM2LONG(v);
}
}
b->file = f;
init(b, fileno(f), indent, buf_size);
if (rb_block_given_p()) {
volatile VALUE rb = Data_Wrap_Struct(builder_class, NULL, builder_free, b);
rb_yield(rb);
bclose(b);
return Qnil;
} else {
return Data_Wrap_Struct(builder_class, NULL, builder_free, b);
}
}
/* call-seq: io(io, options)
*
* Creates a new Builder that will write to an IO instance.
*
* - +io+ (String) IO to write to
* - +options+ - (Hash) formating options
* - +:indent+ (Fixnum) indentaion level, negative values excludes terminating newline
* - +:size+ (Fixnum) the initial size of the string buffer
*/
static VALUE
builder_io(int argc, VALUE *argv, VALUE self) {
Builder b = ALLOC(struct _Builder);
int indent = ox_default_options.indent;
long buf_size = 0;
int fd;
volatile VALUE v;
if (1 > argc) {
rb_raise(ox_arg_error_class, "missing IO object");
}
if (!rb_respond_to(*argv, ox_fileno_id) ||
Qnil == (v = rb_funcall(*argv, ox_fileno_id, 0)) ||
0 == (fd = FIX2INT(v))) {
rb_raise(rb_eIOError, "expected an IO that has a fileno.");
}
if (2 == argc) {
volatile VALUE v;
rb_check_type(argv[1], T_HASH);
if (Qnil != (v = rb_hash_lookup(argv[1], ox_indent_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":indent must be a fixnum.\n");
}
indent = NUM2INT(v);
}
if (Qnil != (v = rb_hash_lookup(argv[1], ox_size_sym))) {
#ifdef RUBY_INTEGER_UNIFICATION
if (rb_cInteger != rb_obj_class(v)) {
#else
if (rb_cFixnum != rb_obj_class(v)) {
#endif
rb_raise(ox_parse_error_class, ":size must be a fixnum.\n");
}
buf_size = NUM2LONG(v);
}
}
b->file = NULL;
init(b, fd, indent, buf_size);
if (rb_block_given_p()) {
volatile VALUE rb = Data_Wrap_Struct(builder_class, NULL, builder_free, b);
rb_yield(rb);
bclose(b);
return Qnil;
} else {
return Data_Wrap_Struct(builder_class, NULL, builder_free, b);
}
}
/* call-seq: instruct(decl,options)
*
* Adds the top level element.
*
* - +decl+ - (String) 'xml' expected
* - +options+ - (Hash) version or encoding
*/
static VALUE
builder_instruct(int argc, VALUE *argv, VALUE self) {
Builder b = (Builder)DATA_PTR(self);
i_am_a_child(b, false);
append_indent(b);
if (0 == argc) {
buf_append_string(&b->buf, "", 7);
b->col += 7;
b->pos += 7;
} else {
volatile VALUE v;
buf_append_string(&b->buf, "", 2);
b->col += 2;
b->pos += 2;
append_sym_str(b, *argv);
if (1 < argc && rb_cHash == rb_obj_class(argv[1])) {
int len;
if (Qnil != (v = rb_hash_lookup(argv[1], ox_version_sym))) {
if (rb_cString != rb_obj_class(v)) {
rb_raise(ox_parse_error_class, ":version must be a Symbol.\n");
}
len = (int)RSTRING_LEN(v);
buf_append_string(&b->buf, " version=\"", 10);
buf_append_string(&b->buf, StringValuePtr(v), len);
buf_append(&b->buf, '"');
b->col += len + 11;
b->pos += len + 11;
}
if (Qnil != (v = rb_hash_lookup(argv[1], ox_encoding_sym))) {
if (rb_cString != rb_obj_class(v)) {
rb_raise(ox_parse_error_class, ":encoding must be a Symbol.\n");
}
len = (int)RSTRING_LEN(v);
buf_append_string(&b->buf, " encoding=\"", 11);
buf_append_string(&b->buf, StringValuePtr(v), len);
buf_append(&b->buf, '"');
b->col += len + 12;
b->pos += len + 12;
strncpy(b->encoding, StringValuePtr(v), sizeof(b->encoding));
b->encoding[sizeof(b->encoding) - 1] = '\0';
}
if (Qnil != (v = rb_hash_lookup(argv[1], ox_standalone_sym))) {
if (rb_cString != rb_obj_class(v)) {
rb_raise(ox_parse_error_class, ":standalone must be a Symbol.\n");
}
len = (int)RSTRING_LEN(v);
buf_append_string(&b->buf, " standalone=\"", 13);
buf_append_string(&b->buf, StringValuePtr(v), len);
buf_append(&b->buf, '"');
b->col += len + 14;
b->pos += len + 14;
}
}
buf_append_string(&b->buf, "?>", 2);
b->col += 2;
b->pos += 2;
}
return Qnil;
}
/* call-seq: element(name,attributes)
*
* Adds an element with the name and attributes provided. If a block is given
* then on closing of the block a pop() done at the close of the block.
*
* - +name+ - (String) name of the element
* - +attributes+ - (Hash) of the element
*/
static VALUE
builder_element(int argc, VALUE *argv, VALUE self) {
Builder b = (Builder)DATA_PTR(self);
Element e;
const char *name;
int len;
if (1 > argc) {
rb_raise(ox_arg_error_class, "missing element name");
}
i_am_a_child(b, false);
append_indent(b);
b->depth++;
if (MAX_DEPTH <= b->depth) {
rb_raise(ox_arg_error_class, "XML too deeply nested");
}
switch (rb_type(*argv)) {
case T_STRING:
name = StringValuePtr(*argv);
len = RSTRING_LEN(*argv);
break;
case T_SYMBOL:
name = rb_id2name(SYM2ID(*argv));
len = strlen(name);
break;
default:
rb_raise(ox_arg_error_class, "expected a Symbol or String for an element name");
break;
}
e = &b->stack[b->depth];
if (sizeof(e->buf) <= (size_t)len) {
e->name = strdup(name);
*e->buf = '\0';
} else {
strcpy(e->buf, name);
e->name = e->buf;
}
e->len = len;
e->has_child = false;
e->non_text_child = false;
buf_append(&b->buf, '<');
b->col++;
b->pos++;
append_string(b, e->name, len, xml_element_chars, false);
if (1 < argc) {
rb_hash_foreach(argv[1], append_attr, (VALUE)b);
}
// Do not close with > or /> yet. That is done with i_am_a_child() or pop().
if (rb_block_given_p()) {
rb_yield(self);
pop(b);
}
return Qnil;
}
/* call-seq: comment(text)
*
* Adds a comment element to the XML string being formed.
* - +text+ - (String) contents of the comment
*/
static VALUE
builder_comment(VALUE self, VALUE text) {
Builder b = (Builder)DATA_PTR(self);
rb_check_type(text, T_STRING);
i_am_a_child(b, false);
append_indent(b);
buf_append_string(&b->buf, "", 4, out);
} else if (ox_raw_clas == clas) {
dump_gen_val_node(*np, d2, "", 0, "", 0, out);
} else if (ox_cdata_clas == clas) {
dump_gen_val_node(*np, d2, "", 3, out);
} else if (ox_doctype_clas == clas) {
dump_gen_val_node(*np, d2, "", 2, out);
} else {
rb_raise(rb_eTypeError, "Unexpected class, %s, while dumping generic XML\n", rb_class2name(clas));
}
}
}
return indent_needed;
}
static int
dump_gen_attr(VALUE key, VALUE value, Out out) {
const char *ks;
size_t klen;
size_t size;
#if HAS_PRIVATE_ENCODING
// There seems to be a bug in jruby for converting symbols to strings and preserving the encoding. This is a work
// around.
ks = rb_str_ptr(rb_String(key));
#else
switch (rb_type(key)) {
case T_SYMBOL:
ks = rb_id2name(SYM2ID(key));
break;
case T_STRING:
ks = StringValuePtr(key);
break;
default:
key = rb_String(key);
ks = StringValuePtr(key);
break;
}
#endif
klen = strlen(ks);
value = rb_String(value);
size = 4 + klen + RSTRING_LEN(value);
if (out->end - out->cur <= (long)size) {
grow(out, size);
}
*out->cur++ = ' ';
fill_value(out, ks, klen);
*out->cur++ = '=';
*out->cur++ = '"';
dump_str_value(out, StringValuePtr(value), RSTRING_LEN(value), xml_quote_chars);
*out->cur++ = '"';
return ST_CONTINUE;
}
static void
dump_gen_val_node(VALUE obj, int depth,
const char *pre, size_t plen,
const char *suf, size_t slen, Out out) {
volatile VALUE v = rb_attr_get(obj, ox_at_value_id);
const char *val;
size_t vlen;
size_t size;
int indent;
if (T_STRING != rb_type(v)) {
return;
}
val = StringValuePtr(v);
vlen = RSTRING_LEN(v);
if (0 > out->indent) {
indent = -1;
} else if (0 == out->indent) {
indent = 0;
} else {
indent = depth * out->indent;
}
size = indent + plen + slen + vlen + out->opts->margin_len;
if (out->end - out->cur <= (long)size) {
grow(out, size);
}
fill_indent(out, indent);
fill_value(out, pre, plen);
fill_value(out, val, vlen);
fill_value(out, suf, slen);
*out->cur = '\0';
}
static void
dump_obj_to_xml(VALUE obj, Options copts, Out out) {
VALUE clas = rb_obj_class(obj);
out->w_time = (Yes == copts->xsd_date) ? dump_time_xsd : dump_time_thin;
out->buf = ALLOC_N(char, 65336);
out->end = out->buf + 65325; /* 10 less than end plus extra for possible errors */
out->cur = out->buf;
out->circ_cache = 0;
out->circ_cnt = 0;
out->opts = copts;
out->obj = obj;
if (Yes == copts->circular) {
ox_cache8_new(&out->circ_cache);
}
out->indent = copts->indent;
if (ox_document_clas == clas) {
dump_gen_doc(obj, -1, out);
} else if (ox_element_clas == clas) {
dump_gen_element(obj, 0, out);
} else {
out->w_start = dump_start;
out->w_end = dump_end;
dump_first_obj(obj, out);
}
dump_value(out, "\n", 1);
if (Yes == copts->circular) {
ox_cache8_delete(out->circ_cache);
}
}
char*
ox_write_obj_to_str(VALUE obj, Options copts) {
struct _Out out;
dump_obj_to_xml(obj, copts, &out);
return out.buf;
}
void
ox_write_obj_to_file(VALUE obj, const char *path, Options copts) {
struct _Out out;
size_t size;
FILE *f;
dump_obj_to_xml(obj, copts, &out);
size = out.cur - out.buf;
if (0 == (f = fopen(path, "w"))) {
rb_raise(rb_eIOError, "%s\n", strerror(errno));
}
if (size != fwrite(out.buf, 1, size, f)) {
int err = ferror(f);
rb_raise(rb_eIOError, "Write failed. [%d:%s]\n", err, strerror(err));
}
xfree(out.buf);
fclose(f);
}
ox-2.8.2/ext/ox/special.c 0000644 0000041 0000041 00000003563 13203413063 015215 0 ustar www-data www-data /* special.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include "special.h"
/*
u0000..u007F 00000000000000xxxxxxx 0xxxxxxx
u0080..u07FF 0000000000yyyyyxxxxxx 110yyyyy 10xxxxxx
u0800..uD7FF, uE000..uFFFF 00000zzzzyyyyyyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx
u10000..u10FFFF uuuzzzzzzyyyyyyxxxxxx 11110uuu 10zzzzzz 10yyyyyy 10xxxxxx
*/
char*
ox_ucs_to_utf8_chars(char *text, uint64_t u) {
int reading = 0;
int i;
unsigned char c;
if (u <= 0x000000000000007FULL) {
/* 0xxxxxxx */
*text++ = (char)u;
} else if (u <= 0x00000000000007FFULL) {
/* 110yyyyy 10xxxxxx */
*text++ = (char)(0x00000000000000C0ULL | (0x000000000000001FULL & (u >> 6)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & u));
} else if (u <= 0x000000000000D7FFULL || (0x000000000000E000ULL <= u && u <= 0x000000000000FFFFULL)) {
/* 1110zzzz 10yyyyyy 10xxxxxx */
*text++ = (char)(0x00000000000000E0ULL | (0x000000000000000FULL & (u >> 12)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & (u >> 6)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & u));
} else if (0x0000000000010000ULL <= u && u <= 0x000000000010FFFFULL) {
/* 11110uuu 10zzzzzz 10yyyyyy 10xxxxxx */
*text++ = (char)(0x00000000000000F0ULL | (0x0000000000000007ULL & (u >> 18)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & (u >> 12)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & (u >> 6)));
*text++ = (char)(0x0000000000000080ULL | (0x000000000000003FULL & u));
} else {
/* assume it is UTF-8 encoded directly and not UCS */
for (i = 56; 0 <= i; i -= 8) {
c = (unsigned char)((u >> i) & 0x00000000000000FFULL);
if (reading) {
*text++ = (char)c;
} else if ('\0' != c) {
*text++ = (char)c;
reading = 1;
}
}
}
return text;
}
ox-2.8.2/ext/ox/attr.h 0000644 0000041 0000041 00000003607 13203413063 014553 0 ustar www-data www-data /* attr.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_ATTR_H__
#define __OX_ATTR_H__
#include "ox.h"
#define ATTR_STACK_INC 8
typedef struct _Attr {
const char *name;
const char *value;
} *Attr;
typedef struct _AttrStack {
struct _Attr base[ATTR_STACK_INC];
Attr head; /* current stack */
Attr end; /* stack end */
Attr tail; /* pointer to one past last element name on stack */
} *AttrStack;
inline static void
attr_stack_init(AttrStack stack) {
stack->head = stack->base;
stack->end = stack->base + sizeof(stack->base) / sizeof(struct _Attr);
stack->tail = stack->head;
stack->head->name = 0;
}
inline static int
attr_stack_empty(AttrStack stack) {
return (stack->head == stack->tail);
}
inline static void
attr_stack_cleanup(AttrStack stack) {
if (stack->base != stack->head) {
xfree(stack->head);
stack->head = stack->base;
}
}
inline static void
attr_stack_push(AttrStack stack, const char *name, const char *value) {
if (stack->end <= stack->tail + 1) {
size_t len = stack->end - stack->head;
size_t toff = stack->tail - stack->head;
if (stack->base == stack->head) {
stack->head = ALLOC_N(struct _Attr, len + ATTR_STACK_INC);
memcpy(stack->head, stack->base, sizeof(struct _Attr) * len);
} else {
REALLOC_N(stack->head, struct _Attr, len + ATTR_STACK_INC);
}
stack->tail = stack->head + toff;
stack->end = stack->head + len + ATTR_STACK_INC;
}
stack->tail->name = name;
stack->tail->value = value;
stack->tail++;
stack->tail->name = 0; // terminate
}
inline static Attr
attr_stack_peek(AttrStack stack) {
if (stack->head < stack->tail) {
return stack->tail - 1;
}
return 0;
}
inline static Attr
attr_stack_pop(AttrStack stack) {
if (stack->head < stack->tail) {
stack->tail--;
return stack->tail;
}
return 0;
}
#endif /* __OX_ATTR_H__ */
ox-2.8.2/ext/ox/hash_load.c 0000644 0000041 0000041 00000012371 13203413063 015514 0 ustar www-data www-data /* hash_load.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#include
#include "ruby.h"
#include "ox.h"
// The approach taken for the hash and has_no_attrs parsing is to push just
// the key on to the stack and then decide what to do on the way up/out.
static VALUE
create_top(PInfo pi) {
volatile VALUE top = rb_hash_new();;
helper_stack_push(&pi->helpers, 0, top, HashCode);
pi->obj = top;
return top;
}
static void
add_text(PInfo pi, char *text, int closed) {
Helper parent = helper_stack_peek(&pi->helpers);
volatile VALUE s = rb_str_new2(text);
volatile VALUE a;
#if HAS_ENCODING_SUPPORT
if (0 != pi->options->rb_enc) {
rb_enc_associate(s, pi->options->rb_enc);
}
#elif HAS_PRIVATE_ENCODING
if (Qnil != pi->options->rb_enc) {
rb_funcall(s, ox_force_encoding_id, 1, pi->options->rb_enc);
}
#endif
switch (parent->type) {
case NoCode:
parent->obj = s;
parent->type = StringCode;
break;
case ArrayCode:
rb_ary_push(parent->obj, s);
break;
default:
a = rb_ary_new();
rb_ary_push(a, parent->obj);
rb_ary_push(a, s);
parent->obj = a;
parent->type = ArrayCode;
break;
}
}
static void
add_element(PInfo pi, const char *ename, Attr attrs, int hasChildren) {
if (helper_stack_empty(&pi->helpers)) {
create_top(pi);
}
if (NULL != attrs && NULL != attrs->name) {
volatile VALUE h = rb_hash_new();
volatile VALUE key;
volatile VALUE val;
volatile VALUE a;
for (; 0 != attrs->name; attrs++) {
if (Yes == pi->options->sym_keys) {
key = rb_id2sym(rb_intern(attrs->name));
} else {
key = rb_str_new2(attrs->name);
}
val = rb_str_new2(attrs->value);
#if HAS_ENCODING_SUPPORT
if (0 != pi->options->rb_enc) {
rb_enc_associate(val, pi->options->rb_enc);
}
#elif HAS_PRIVATE_ENCODING
if (Qnil != pi->options->rb_enc) {
rb_funcall(val, ox_force_encoding_id, 1, pi->options->rb_enc);
}
#endif
rb_hash_aset(h, key, val);
}
a = rb_ary_new();
rb_ary_push(a, h);
rb_obj_taint(a); // flag indicating it is a unit, kind of a hack but it works
helper_stack_push(&pi->helpers, rb_intern(ename), a, ArrayCode);
} else {
helper_stack_push(&pi->helpers, rb_intern(ename), Qnil, NoCode);
}
}
static void
add_element_no_attrs(PInfo pi, const char *ename, Attr attrs, int hasChildren) {
if (helper_stack_empty(&pi->helpers)) {
create_top(pi);
}
helper_stack_push(&pi->helpers, rb_intern(ename), Qnil, NoCode);
}
static int
untaint_hash_cb(VALUE key, VALUE value, VALUE x) {
if (Qtrue == rb_obj_tainted(value)) {
rb_obj_untaint(value);
}
return ST_CONTINUE;
}
static void
end_element_core(PInfo pi, const char *ename, bool check_taint) {
Helper e = helper_stack_pop(&pi->helpers);
Helper parent = helper_stack_peek(&pi->helpers);
volatile VALUE pobj = parent->obj;
volatile VALUE found = Qundef;
volatile VALUE key;
volatile VALUE a;
if (NoCode == e->type) {
e->obj = Qnil;
}
if (Yes == pi->options->sym_keys) {
key = rb_id2sym(e->var);
} else {
key = rb_id2str(e->var);
}
// Make sure the parent is a Hash. If not set then make a Hash. If an
// Array or non-Hash then append to array or create and append.
switch (parent->type) {
case NoCode:
pobj = rb_hash_new();
parent->obj = pobj;
parent->type = HashCode;
break;
case ArrayCode:
pobj = rb_hash_new();
rb_ary_push(parent->obj, pobj);
break;
case HashCode:
found = rb_hash_lookup2(parent->obj, key, Qundef);
break;
default:
a = rb_ary_new();
rb_ary_push(a, parent->obj);
pobj = rb_hash_new();
rb_ary_push(a, pobj);
parent->obj = a;
parent->type = ArrayCode;
break;
}
if (Qundef == found) {
rb_hash_aset(pobj, key, e->obj);
} else if (RUBY_T_ARRAY == rb_type(found)) {
if (check_taint && Qtrue == rb_obj_tainted(found)) {
rb_obj_untaint(found);
a = rb_ary_new();
rb_ary_push(a, found);
rb_ary_push(a, e->obj);
rb_hash_aset(pobj, key, a);
} else {
rb_ary_push(found, e->obj);
}
} else { // something there other than an array
if (check_taint && Qtrue == rb_obj_tainted(e->obj)) {
rb_obj_untaint(e->obj);
}
a = rb_ary_new();
rb_ary_push(a, found);
rb_ary_push(a, e->obj);
rb_hash_aset(pobj, key, a);
}
if (check_taint && RUBY_T_HASH == rb_type(e->obj)) {
rb_hash_foreach(e->obj, untaint_hash_cb, Qnil);
}
}
static void
end_element(PInfo pi, const char *ename) {
end_element_core(pi, ename, true);
}
static void
end_element_no_attrs(PInfo pi, const char *ename) {
end_element_core(pi, ename, false);
}
static void
finish(PInfo pi) {
if (Qnil != pi->obj && RUBY_T_HASH == rb_type(pi->obj)) {
rb_hash_foreach(pi->obj, untaint_hash_cb, Qnil);
}
}
struct _ParseCallbacks _ox_hash_callbacks = {
NULL,
NULL,
NULL,
NULL,
add_text,
add_element,
end_element,
finish,
};
ParseCallbacks ox_hash_callbacks = &_ox_hash_callbacks;
struct _ParseCallbacks _ox_hash_no_attrs_callbacks = {
NULL,
NULL,
NULL,
NULL,
add_text,
add_element_no_attrs,
end_element_no_attrs,
NULL,
};
ParseCallbacks ox_hash_no_attrs_callbacks = &_ox_hash_no_attrs_callbacks;
ox-2.8.2/ext/ox/encode.h 0000644 0000041 0000041 00000000742 13203413063 015033 0 ustar www-data www-data /* encode.h
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#ifndef __OX_ENCODE_H__
#define __OX_ENCODE_H__
#include "ruby.h"
#if HAS_ENCODING_SUPPORT
#include "ruby/encoding.h"
#endif
static inline VALUE
ox_encode(VALUE rstr) {
#if HAS_ENCODING_SUPPORT
rb_enc_associate(rstr, ox_utf8_encoding);
#else
if (Qnil != ox_utf8_encoding) {
rstr = rb_funcall(ox_utf8_encoding, ox_iconv_id, 1, rstr);
}
#endif
return rstr;
}
#endif /* __OX_ENCODE_H__ */
ox-2.8.2/ext/ox/parse.c 0000644 0000041 0000041 00000065143 13203413063 014711 0 ustar www-data www-data /* parse.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include "ruby.h"
#include "ox.h"
#include "err.h"
#include "attr.h"
#include "helper.h"
#include "special.h"
static void read_instruction(PInfo pi);
static void read_doctype(PInfo pi);
static void read_comment(PInfo pi);
static char* read_element(PInfo pi);
static void read_text(PInfo pi);
/*static void read_reduced_text(PInfo pi); */
static void read_cdata(PInfo pi);
static char* read_name_token(PInfo pi);
static char* read_quoted_value(PInfo pi);
static char* read_hex_uint64(char *b, uint64_t *up);
static char* read_10_uint64(char *b, uint64_t *up);
static char* read_coded_chars(PInfo pi, char *text);
static void next_non_white(PInfo pi);
static int collapse_special(PInfo pi, char *str);
/* This XML parser is a single pass, destructive, callback parser. It is a
* single pass parse since it only make one pass over the characters in the
* XML document string. It is destructive because it re-uses the content of
* the string for values in the callback and places \0 characters at various
* places to mark the end of tokens and strings. It is a callback parser like
* a SAX parser because it uses callback when document elements are
* encountered.
*
* Parsing is very tolerant. Lack of headers and even mispelled element
* endings are passed over without raising an error. A best attempt is made in
* all cases to parse the string.
*/
static char xml_valid_lower_chars[34] = "xxxxxxxxxooxxoxxxxxxxxxxxxxxxxxxo";
inline static int
is_white(char c) {
switch (c) {
case ' ':
case '\t':
case '\f':
case '\n':
case '\r':
return 1;
default:
return 0;
}
}
inline static void
next_non_white(PInfo pi) {
for (; 1; pi->s++) {
switch (*pi->s) {
case ' ':
case '\t':
case '\f':
case '\n':
case '\r':
break;
default:
return;
}
}
}
inline static void
next_white(PInfo pi) {
for (; 1; pi->s++) {
switch (*pi->s) {
case ' ':
case '\t':
case '\f':
case '\n':
case '\r':
case '\0':
return;
default:
break;
}
}
}
static void
mark_pi_cb(void *ptr) {
if (NULL != ptr) {
HelperStack stack = &((PInfo)ptr)->helpers;
Helper h;
for (h = stack->head; h < stack->tail; h++) {
if (NoCode != h->type) {
rb_gc_mark(h->obj);
}
}
}
}
VALUE
ox_parse(char *xml, ParseCallbacks pcb, char **endp, Options options, Err err) {
struct _PInfo pi;
int body_read = 0;
int block_given = rb_block_given_p();
volatile VALUE wrap;
if (0 == xml) {
set_error(err, "Invalid arg, xml string can not be null", xml, 0);
return Qnil;
}
if (DEBUG <= options->trace) {
printf("Parsing xml:\n%s\n", xml);
}
/* initialize parse info */
helper_stack_init(&pi.helpers);
// Protect against GC
wrap = Data_Wrap_Struct(rb_cObject, mark_pi_cb, NULL, &pi);
err_init(&pi.err);
pi.str = xml;
pi.s = xml;
pi.pcb = pcb;
pi.obj = Qnil;
pi.circ_array = 0;
pi.options = options;
while (1) {
next_non_white(&pi); /* skip white space */
if ('\0' == *pi.s) {
break;
}
if (body_read && 0 != endp) {
*endp = pi.s;
break;
}
if ('<' != *pi.s) { /* all top level entities start with < */
set_error(err, "invalid format, expected <", pi.str, pi.s);
helper_stack_cleanup(&pi.helpers);
return Qnil;
}
pi.s++; /* past < */
switch (*pi.s) {
case '?': /* processing instruction */
pi.s++;
read_instruction(&pi);
break;
case '!': /* comment or doctype */
pi.s++;
if ('\0' == *pi.s) {
set_error(err, "invalid format, DOCTYPE or comment not terminated", pi.str, pi.s);
helper_stack_cleanup(&pi.helpers);
return Qnil;
} else if ('-' == *pi.s) {
pi.s++; /* skip - */
if ('-' != *pi.s) {
set_error(err, "invalid format, bad comment format", pi.str, pi.s);
helper_stack_cleanup(&pi.helpers);
return Qnil;
} else {
pi.s++; /* skip second - */
read_comment(&pi);
}
} else if ((TolerantEffort == options->effort) ? 0 == strncasecmp("DOCTYPE", pi.s, 7) : 0 == strncmp("DOCTYPE", pi.s, 7)) {
pi.s += 7;
read_doctype(&pi);
} else {
set_error(err, "invalid format, DOCTYPE or comment expected", pi.str, pi.s);
helper_stack_cleanup(&pi.helpers);
return Qnil;
}
break;
case '\0':
set_error(err, "invalid format, document not terminated", pi.str, pi.s);
helper_stack_cleanup(&pi.helpers);
return Qnil;
default:
read_element(&pi);
body_read = 1;
break;
}
if (err_has(&pi.err)) {
*err = pi.err;
helper_stack_cleanup(&pi.helpers);
return Qnil;
}
if (block_given && Qnil != pi.obj && Qundef != pi.obj) {
if (NULL != pcb->finish) {
pcb->finish(&pi);
}
rb_yield(pi.obj);
}
}
DATA_PTR(wrap) = NULL;
helper_stack_cleanup(&pi.helpers);
if (NULL != pcb->finish) {
pcb->finish(&pi);
}
return pi.obj;
}
static char*
gather_content(const char *src, char *content, size_t len) {
for (; 0 < len; src++, content++, len--) {
switch (*src) {
case '?':
if ('>' == *(src + 1)) {
*content = '\0';
return (char*)(src + 1);
}
*content = *src;
break;
case '\0':
return 0;
default:
*content = *src;
break;
}
}
return 0;
}
/* Entered after the "" sequence. Ready to read the rest.
*/
static void
read_instruction(PInfo pi) {
char content[1024];
struct _AttrStack attrs;
char *attr_name;
char *attr_value;
char *target;
char *end;
char c;
char *cend;
int attrs_ok = 1;
*content = '\0';
attr_stack_init(&attrs);
if (0 == (target = read_name_token(pi))) {
return;
}
end = pi->s;
if (0 == (cend = gather_content(pi->s, content, sizeof(content) - 1))) {
set_error(&pi->err, "processing instruction content too large or not terminated", pi->str, pi->s);
return;
}
next_non_white(pi);
c = *pi->s;
*end = '\0'; /* terminate name */
if ('?' != c) {
while ('?' != c) {
pi->last = 0;
if ('\0' == *pi->s) {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, processing instruction not terminated", pi->str, pi->s);
return;
}
next_non_white(pi);
if (0 == (attr_name = read_name_token(pi))) {
attr_stack_cleanup(&attrs);
return;
}
end = pi->s;
next_non_white(pi);
if ('=' != *pi->s++) {
attrs_ok = 0;
break;
}
*end = '\0'; /* terminate name */
/* read value */
next_non_white(pi);
if (0 == (attr_value = read_quoted_value(pi))) {
attr_stack_cleanup(&attrs);
return;
}
attr_stack_push(&attrs, attr_name, attr_value);
next_non_white(pi);
if ('\0' == pi->last) {
c = *pi->s;
} else {
c = pi->last;
}
}
if ('?' == *pi->s) {
pi->s++;
}
} else {
pi->s++;
}
if (attrs_ok) {
if ('>' != *pi->s++) {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, processing instruction not terminated", pi->str, pi->s);
return;
}
} else {
pi->s = cend + 1;
}
if (0 != pi->pcb->instruct) {
if (attrs_ok) {
pi->pcb->instruct(pi, target, attrs.head, 0);
} else {
pi->pcb->instruct(pi, target, attrs.head, content);
}
}
attr_stack_cleanup(&attrs);
}
static void
read_delimited(PInfo pi, char end) {
char c;
if ('"' == end || '\'' == end) {
for (c = *pi->s++; end != c; c = *pi->s++) {
if ('\0' == c) {
set_error(&pi->err, "invalid format, dectype not terminated", pi->str, pi->s);
return;
}
}
} else {
while (1) {
c = *pi->s++;
if (end == c) {
return;
}
switch (c) {
case '\0':
set_error(&pi->err, "invalid format, dectype not terminated", pi->str, pi->s);
return;
case '"':
read_delimited(pi, c);
break;
case '\'':
read_delimited(pi, c);
break;
case '[':
read_delimited(pi, ']');
break;
case '<':
read_delimited(pi, '>');
break;
default:
break;
}
}
}
}
/* Entered after the "s;
read_delimited(pi, '>');
if (err_has(&pi->err)) {
return;
}
pi->s--;
*pi->s = '\0';
pi->s++;
if (0 != pi->pcb->add_doctype) {
pi->pcb->add_doctype(pi, docType);
}
}
/* Entered after "");
if (0 == end) {
set_error(&pi->err, "invalid format, comment not terminated", pi->str, pi->s);
return;
}
for (s = end - 1; pi->s < s && !done; s--) {
switch(*s) {
case ' ':
case '\t':
case '\f':
case '\n':
case '\r':
break;
default:
*(s + 1) = '\0';
done = 1;
break;
}
}
*end = '\0'; /* in case the comment was blank */
pi->s = end + 3;
if (0 != pi->pcb->add_comment) {
pi->pcb->add_comment(pi, comment);
}
}
/* Entered after the '<' and the first character after that. Returns status
* code.
*/
static char*
read_element(PInfo pi) {
struct _AttrStack attrs;
const char *attr_name;
const char *attr_value;
char *name;
char *ename;
char *end;
char c;
long elen;
int hasChildren = 0;
int done = 0;
attr_stack_init(&attrs);
if (0 == (ename = read_name_token(pi))) {
return 0;
}
end = pi->s;
elen = end - ename;
next_non_white(pi);
c = *pi->s;
*end = '\0';
if ('/' == c) {
/* empty element, no attributes and no children */
pi->s++;
if ('>' != *pi->s) {
/*printf("*** '%s' ***\n", pi->s); */
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, element not closed", pi->str, pi->s);
return 0;
}
pi->s++; /* past > */
pi->pcb->add_element(pi, ename, attrs.head, hasChildren);
pi->pcb->end_element(pi, ename);
attr_stack_cleanup(&attrs);
return 0;
}
/* read attribute names until the close (/ or >) is reached */
while (!done) {
if ('\0' == c) {
next_non_white(pi);
c = *pi->s;
}
pi->last = 0;
switch (c) {
case '\0':
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
case '/':
/* Element with just attributes. */
pi->s++;
if ('>' != *pi->s) {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, element not closed", pi->str, pi->s);
return 0;
}
pi->s++;
pi->pcb->add_element(pi, ename, attrs.head, hasChildren);
pi->pcb->end_element(pi, ename);
attr_stack_cleanup(&attrs);
return 0;
case '>':
/* has either children or a value */
pi->s++;
hasChildren = 1;
done = 1;
pi->pcb->add_element(pi, ename, attrs.head, hasChildren);
break;
default:
/* Attribute name so it's an element and the attribute will be */
/* added to it. */
if (0 == (attr_name = read_name_token(pi))) {
attr_stack_cleanup(&attrs);
return 0;
}
end = pi->s;
next_non_white(pi);
if ('=' != *pi->s++) {
if (TolerantEffort == pi->options->effort) {
pi->s--;
pi->last = *pi->s;
*end = '\0'; /* terminate name */
attr_value = "";
attr_stack_push(&attrs, attr_name, attr_value);
break;
} else {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, no attribute value", pi->str, pi->s);
return 0;
}
}
*end = '\0'; /* terminate name */
/* read value */
next_non_white(pi);
if (0 == (attr_value = read_quoted_value(pi))) {
return 0;
}
if (pi->options->convert_special && 0 != strchr(attr_value, '&')) {
if (0 != collapse_special(pi, (char*)attr_value) || err_has(&pi->err)) {
attr_stack_cleanup(&attrs);
return 0;
}
}
attr_stack_push(&attrs, attr_name, attr_value);
break;
}
if ('\0' == pi->last) {
c = '\0';
} else {
c = pi->last;
pi->last = '\0';
}
}
if (hasChildren) {
char *start;
int first = 1;
done = 0;
/* read children */
while (!done) {
start = pi->s;
next_non_white(pi);
c = *pi->s++;
if ('\0' == c) {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
}
if ('<' == c) {
char *slash;
switch (*pi->s) {
case '!': /* better be a comment or CDATA */
pi->s++;
if ('-' == *pi->s && '-' == *(pi->s + 1)) {
pi->s += 2;
read_comment(pi);
} else if ((TolerantEffort == pi->options->effort) ?
0 == strncasecmp("[CDATA[", pi->s, 7) :
0 == strncmp("[CDATA[", pi->s, 7)) {
pi->s += 7;
read_cdata(pi);
} else {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, invalid comment or CDATA format", pi->str, pi->s);
return 0;
}
break;
case '?': /* processing instruction */
pi->s++;
read_instruction(pi);
break;
case '/':
slash = pi->s;
pi->s++;
if (0 == (name = read_name_token(pi))) {
attr_stack_cleanup(&attrs);
return 0;
}
end = pi->s;
next_non_white(pi);
c = *pi->s;
*end = '\0';
if (0 != ((TolerantEffort == pi->options->effort) ? strcasecmp(name, ename) : strcmp(name, ename))) {
attr_stack_cleanup(&attrs);
if (TolerantEffort == pi->options->effort) {
pi->pcb->end_element(pi, ename);
return name;
} else {
set_error(&pi->err, "invalid format, elements overlap", pi->str, pi->s);
return 0;
}
}
if ('>' != c) {
attr_stack_cleanup(&attrs);
set_error(&pi->err, "invalid format, element not closed", pi->str, pi->s);
return 0;
}
if (first && start != slash - 1) {
// Some white space between start and here so add as
// text after checking skip.
*(slash - 1) = '\0';
switch (pi->options->skip) {
case CrSkip: {
char *s = start;
char *e = start;
for (; '\0' != *e; e++) {
if ('\r' != *e) {
*s++ = *e;
}
}
*s = '\0';
break;
}
case SpcSkip:
*start = '\0';
break;
case NoSkip:
case OffSkip:
default:
break;
}
if ('\0' != *start) {
pi->pcb->add_text(pi, start, 1);
}
}
pi->s++;
pi->pcb->end_element(pi, ename);
attr_stack_cleanup(&attrs);
return 0;
case '\0':
attr_stack_cleanup(&attrs);
if (TolerantEffort == pi->options->effort) {
return 0;
} else {
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
}
default:
first = 0;
/* a child element */
// Child closed with mismatched name.
if (0 != (name = read_element(pi))) {
attr_stack_cleanup(&attrs);
if (0 == ((TolerantEffort == pi->options->effort) ? strcasecmp(name, ename) : strcmp(name, ename))) {
pi->s++;
pi->pcb->end_element(pi, ename);
return 0;
} else { // not the correct element yet
pi->pcb->end_element(pi, ename);
return name;
}
} else if (err_has(&pi->err)) {
return 0;
}
break;
}
} else { /* read as TEXT */
pi->s = start;
/*pi->s--; */
read_text(pi);
/*read_reduced_text(pi); */
/* to exit read_text with no errors the next character must be < */
if ('/' == *(pi->s + 1) &&
0 == ((TolerantEffort == pi->options->effort) ? strncasecmp(ename, pi->s + 2, elen) : strncmp(ename, pi->s + 2, elen)) &&
'>' == *(pi->s + elen + 2)) {
/* close tag after text so treat as a value */
pi->s += elen + 3;
pi->pcb->end_element(pi, ename);
attr_stack_cleanup(&attrs);
return 0;
}
}
}
}
attr_stack_cleanup(&attrs);
return 0;
}
static void
read_text(PInfo pi) {
char buf[MAX_TEXT_LEN];
char *b = buf;
char *alloc_buf = 0;
char *end = b + sizeof(buf) - 2;
char c;
int done = 0;
while (!done) {
c = *pi->s++;
switch(c) {
case '<':
done = 1;
pi->s--;
break;
case '\0':
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return;
default:
if (end <= (b + (('&' == c) ? 7 : 0))) { /* extra 8 for special just in case it is sequence of bytes */
unsigned long size;
if (0 == alloc_buf) {
size = sizeof(buf) * 2;
alloc_buf = ALLOC_N(char, size);
memcpy(alloc_buf, buf, b - buf);
b = alloc_buf + (b - buf);
} else {
unsigned long pos = b - alloc_buf;
size = (end - alloc_buf) * 2;
REALLOC_N(alloc_buf, char, size);
b = alloc_buf + pos;
}
end = alloc_buf + size - 2;
}
if ('&' == c) {
if (0 == (b = read_coded_chars(pi, b))) {
return;
}
} else {
if (0 <= c && c <= 0x20) {
if (StrictEffort == pi->options->effort && 'x' == xml_valid_lower_chars[(unsigned char)c]) {
set_error(&pi->err, "invalid character", pi->str, pi->s);
return;
}
switch (pi->options->skip) {
case CrSkip:
if (buf != b && '\n' == c && '\r' == *(b - 1)) {
*(b - 1) = '\n';
} else {
*b++ = c;
}
break;
case SpcSkip:
if (is_white(c)) {
if (buf == b || ' ' != *(b - 1)) {
*b++ = ' ';
}
} else {
*b++ = c;
}
break;
case NoSkip:
case OffSkip:
default:
*b++ = c;
break;
}
} else {
*b++ = c;
}
}
break;
}
}
*b = '\0';
if (0 != alloc_buf) {
pi->pcb->add_text(pi, alloc_buf, ('/' == *(pi->s + 1)));
xfree(alloc_buf);
} else {
pi->pcb->add_text(pi, buf, ('/' == *(pi->s + 1)));
}
}
#if 0
static void
read_reduced_text(PInfo pi) {
char buf[MAX_TEXT_LEN];
char *b = buf;
char *alloc_buf = 0;
char *end = b + sizeof(buf) - 2;
char c;
int spc = 0;
int done = 0;
while (!done) {
c = *pi->s++;
switch(c) {
case ' ':
case '\t':
case '\f':
case '\n':
case '\r':
spc = 1;
break;
case '<':
done = 1;
pi->s--;
break;
case '\0':
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return;
default:
if (end <= (b + spc + (('&' == c) ? 7 : 0))) { /* extra 8 for special just in case it is sequence of bytes */
unsigned long size;
if (0 == alloc_buf) {
size = sizeof(buf) * 2;
alloc_buf = ALLOC_N(char, size);
memcpy(alloc_buf, buf, b - buf);
b = alloc_buf + (b - buf);
} else {
unsigned long pos = b - alloc_buf;
size = (end - alloc_buf) * 2;
REALLOC(alloc_buf, char, size);
b = alloc_buf + pos;
}
end = alloc_buf + size - 2;
}
if (spc) {
*b++ = ' ';
}
spc = 0;
if ('&' == c) {
if (0 == (b = read_coded_chars(pi, b))) {
return;
}
} else {
*b++ = c;
}
break;
}
}
*b = '\0';
if (0 != alloc_buf) {
pi->pcb->add_text(pi, alloc_buf, ('/' == *(pi->s + 1)));
xfree(alloc_buf);
} else {
pi->pcb->add_text(pi, buf, ('/' == *(pi->s + 1)));
}
}
#endif
static char*
read_name_token(PInfo pi) {
char *start;
next_non_white(pi);
start = pi->s;
for (; 1; pi->s++) {
switch (*pi->s) {
case ' ':
case '\t':
case '\f':
case '?':
case '=':
case '/':
case '>':
case '\n':
case '\r':
return start;
case '\0':
/* documents never terminate after a name token */
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
break; /* to avoid warnings */
case ':':
if ('\0' == *pi->options->strip_ns) {
break;
} else if ('*' == *pi->options->strip_ns && '\0' == pi->options->strip_ns[1]) {
start = pi->s + 1;
} else if (0 == strncmp(pi->options->strip_ns, start, pi->s - start)) {
start = pi->s + 1;
}
break;
default:
break;
}
}
return start;
}
static void
read_cdata(PInfo pi) {
char *start;
char *end;
start = pi->s;
end = strstr(pi->s, "]]>");
if (end == 0) {
set_error(&pi->err, "invalid format, CDATA not terminated", pi->str, pi->s);
return;
}
*end = '\0';
pi->s = end + 3;
if (0 != pi->pcb->add_cdata) {
pi->pcb->add_cdata(pi, start, end - start);
}
}
/* Assume the value starts immediately and goes until the quote character is
* reached again. Do not read the character after the terminating quote.
*/
static char*
read_quoted_value(PInfo pi) {
char *value = 0;
if ('"' == *pi->s || '\'' == *pi->s) {
char term = *pi->s;
pi->s++; /* skip quote character */
value = pi->s;
for (; *pi->s != term; pi->s++) {
if ('\0' == *pi->s) {
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
}
}
*pi->s = '\0'; /* terminate value */
pi->s++; /* move past quote */
} else if (StrictEffort == pi->options->effort) {
set_error(&pi->err, "invalid format, expected a quote character", pi->str, pi->s);
return 0;
} else if (TolerantEffort == pi->options->effort) {
value = pi->s;
for (; 1; pi->s++) {
switch (*pi->s) {
case '\0':
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
case ' ':
case '/':
case '>':
case '?': // for instructions
case '\t':
case '\n':
case '\r':
pi->last = *pi->s;
*pi->s = '\0'; /* terminate value */
pi->s++;
return value;
default:
break;
}
}
} else {
value = pi->s;
next_white(pi);
if ('\0' == *pi->s) {
set_error(&pi->err, "invalid format, document not terminated", pi->str, pi->s);
return 0;
}
*pi->s++ = '\0'; /* terminate value */
}
return value;
}
static char*
read_hex_uint64(char *b, uint64_t *up) {
uint64_t u = 0;
char c;
for (; ';' != *b; b++) {
c = *b;
if ('0' <= c && c <= '9') {
u = (u << 4) | (uint64_t)(c - '0');
} else if ('a' <= c && c <= 'f') {
u = (u << 4) | (uint64_t)(c - 'a' + 10);
} else if ('A' <= c && c <= 'F') {
u = (u << 4) | (uint64_t)(c - 'A' + 10);
} else {
return 0;
}
}
*up = u;
return b;
}
static char*
read_10_uint64(char *b, uint64_t *up) {
uint64_t u = 0;
char c;
for (; ';' != *b; b++) {
c = *b;
if ('0' <= c && c <= '9') {
u = (u * 10) + (uint64_t)(c - '0');
} else {
return 0;
}
}
*up = u;
return b;
}
static char*
read_coded_chars(PInfo pi, char *text) {
char *b, buf[32];
char *end = buf + sizeof(buf) - 1;
char *s;
for (b = buf, s = pi->s; b < end; b++, s++) {
*b = *s;
if (';' == *s) {
*(b + 1) = '\0';
s++;
break;
}
}
if (b > end) {
*text++ = '&';
} else if ('#' == *buf) {
uint64_t u = 0;
b = buf + 1;
if ('x' == *b || 'X' == *b) {
b = read_hex_uint64(b + 1, &u);
} else {
b = read_10_uint64(b, &u);
}
if (0 == b) {
*text++ = '&';
} else {
if (u <= 0x000000000000007FULL) {
*text++ = (char)u;
#if HAS_PRIVATE_ENCODING
} else if (ox_utf8_encoding == pi->options->rb_enc ||
0 == strcasecmp(rb_str_ptr(rb_String(ox_utf8_encoding)), rb_str_ptr(rb_String(pi->options->rb_enc)))) {
#else
} else if (ox_utf8_encoding == pi->options->rb_enc) {
#endif
text = ox_ucs_to_utf8_chars(text, u);
#if HAS_PRIVATE_ENCODING
} else if (Qnil == pi->options->rb_enc) {
#else
} else if (0 == pi->options->rb_enc) {
#endif
pi->options->rb_enc = ox_utf8_encoding;
text = ox_ucs_to_utf8_chars(text, u);
} else if (TolerantEffort == pi->options->effort) {
*text++ = '&';
return text;
} else if (u <= 0x00000000000000FFULL) {
*text++ = (char)u;
} else {
/*set_error(&pi->err, "Invalid encoding, need UTF-8 or UTF-16 encoding to parse nnnn; character sequences.", pi->str, pi->s); */
set_error(&pi->err, "Invalid encoding, need UTF-8 encoding to parse nnnn; character sequences.", pi->str, pi->s);
return 0;
}
pi->s = s;
}
} else if (0 == strcasecmp(buf, "nbsp;")) {
pi->s = s;
*text++ = ' ';
} else if (0 == strcasecmp(buf, "lt;")) {
pi->s = s;
*text++ = '<';
} else if (0 == strcasecmp(buf, "gt;")) {
pi->s = s;
*text++ = '>';
} else if (0 == strcasecmp(buf, "amp;")) {
pi->s = s;
*text++ = '&';
} else if (0 == strcasecmp(buf, "quot;")) {
pi->s = s;
*text++ = '"';
} else if (0 == strcasecmp(buf, "apos;")) {
pi->s = s;
*text++ = '\'';
} else {
*text++ = '&';
}
return text;
}
static int
collapse_special(PInfo pi, char *str) {
char *s = str;
char *b = str;
while ('\0' != *s) {
if ('&' == *s) {
int c;
char *end;
s++;
if ('#' == *s) {
uint64_t u = 0;
char x;
s++;
if ('x' == *s || 'X' == *s) {
x = *s;
s++;
end = read_hex_uint64(s, &u);
} else {
x = '\0';
end = read_10_uint64(s, &u);
}
if (0 == end) {
if (TolerantEffort == pi->options->effort) {
*b++ = '&';
*b++ = '#';
if ('\0' != x) {
*b++ = x;
}
continue;
}
return EDOM;
}
if (u <= 0x000000000000007FULL) {
*b++ = (char)u;
#if HAS_PRIVATE_ENCODING
} else if (ox_utf8_encoding == pi->options->rb_enc ||
0 == strcasecmp(rb_str_ptr(rb_String(ox_utf8_encoding)), rb_str_ptr(rb_String(pi->options->rb_enc)))) {
#else
} else if (ox_utf8_encoding == pi->options->rb_enc) {
#endif
b = ox_ucs_to_utf8_chars(b, u);
/* TBD support UTF-16 */
#if HAS_PRIVATE_ENCODING
} else if (Qnil == pi->options->rb_enc) {
#else
} else if (0 == pi->options->rb_enc) {
#endif
pi->options->rb_enc = ox_utf8_encoding;
b = ox_ucs_to_utf8_chars(b, u);
} else {
/* set_error(&pi->err, "Invalid encoding, need UTF-8 or UTF-16 encoding to parse nnnn; character sequences.", pi->str, pi->s);*/
set_error(&pi->err, "Invalid encoding, need UTF-8 encoding to parse nnnn; character sequences.", pi->str, pi->s);
return 0;
}
s = end + 1;
} else {
if (0 == strncasecmp(s, "lt;", 3)) {
c = '<';
s += 3;
} else if (0 == strncasecmp(s, "gt;", 3)) {
c = '>';
s += 3;
} else if (0 == strncasecmp(s, "amp;", 4)) {
c = '&';
s += 4;
} else if (0 == strncasecmp(s, "quot;", 5)) {
c = '"';
s += 5;
} else if (0 == strncasecmp(s, "apos;", 5)) {
c = '\'';
s += 5;
} else if (TolerantEffort == pi->options->effort) {
*b++ = '&';
continue;
} else {
c = '?';
while (';' != *s++) {
if ('\0' == *s) {
set_error(&pi->err, "Invalid format, special character does not end with a semicolon", pi->str, pi->s);
return EDOM;
}
}
s++;
set_error(&pi->err, "Invalid format, invalid special character sequence", pi->str, pi->s);
return 0;
}
*b++ = (char)c;
}
} else {
*b++ = *s++;
}
}
*b = '\0';
return 0;
}
ox-2.8.2/ext/ox/sax_buf.c 0000644 0000041 0000041 00000013317 13203413063 015222 0 ustar www-data www-data /* sax_buf.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include
#include
#include
#if NEEDS_UIO
#include
#endif
#include
#include
#include "ruby.h"
#include "ox.h"
#include "sax.h"
#define BUF_PAD 4
static VALUE rescue_cb(VALUE rdr, VALUE err);
static VALUE io_cb(VALUE rdr);
static VALUE partial_io_cb(VALUE rdr);
static int read_from_io(Buf buf);
static int read_from_fd(Buf buf);
static int read_from_io_partial(Buf buf);
static int read_from_str(Buf buf);
void
ox_sax_buf_init(Buf buf, VALUE io) {
volatile VALUE io_class = rb_obj_class(io);
VALUE rfd;
if (rb_cString == io_class) {
buf->read_func = read_from_str;
buf->in.str = StringValuePtr(io);
} else if (ox_stringio_class == io_class && 0 == FIX2INT(rb_funcall2(io, ox_pos_id, 0, 0))) {
volatile VALUE s = rb_funcall2(io, ox_string_id, 0, 0);
buf->read_func = read_from_str;
buf->in.str = StringValuePtr(s);
} else if (rb_cFile == io_class && Qnil != (rfd = rb_funcall(io, ox_fileno_id, 0))) {
buf->read_func = read_from_fd;
buf->in.fd = FIX2INT(rfd);
} else if (rb_respond_to(io, ox_readpartial_id)) {
buf->read_func = read_from_io_partial;
buf->in.io = io;
} else if (rb_respond_to(io, ox_read_id)) {
buf->read_func = read_from_io;
buf->in.io = io;
} else {
rb_raise(ox_arg_error_class, "sax_parser io argument must respond to readpartial() or read().\n");
}
buf->head = buf->base;
*buf->head = '\0';
buf->end = buf->head + sizeof(buf->base) - BUF_PAD;
buf->tail = buf->head;
buf->read_end = buf->head;
buf->pro = 0;
buf->str = 0;
buf->pos = 0;
buf->line = 1;
buf->col = 0;
buf->pro_pos = 1;
buf->pro_line = 1;
buf->pro_col = 0;
buf->dr = 0;
}
int
ox_sax_buf_read(Buf buf) {
int err;
size_t shift = 0;
// if there is not much room to read into, shift or realloc a larger buffer.
if (buf->head < buf->tail && 4096 > buf->end - buf->tail) {
if (0 == buf->pro) {
shift = buf->tail - buf->head;
} else {
shift = buf->pro - buf->head - 1; // leave one character so we cab backup one
}
if (0 >= shift) { /* no space left so allocate more */
char *old = buf->head;
size_t size = buf->end - buf->head + BUF_PAD;
if (buf->head == buf->base) {
buf->head = ALLOC_N(char, size * 2);
memcpy(buf->head, old, size);
} else {
REALLOC_N(buf->head, char, size * 2);
}
buf->end = buf->head + size * 2 - BUF_PAD;
buf->tail = buf->head + (buf->tail - old);
buf->read_end = buf->head + (buf->read_end - old);
if (0 != buf->pro) {
buf->pro = buf->head + (buf->pro - old);
}
if (0 != buf->str) {
buf->str = buf->head + (buf->str - old);
}
} else {
memmove(buf->head, buf->head + shift, buf->read_end - (buf->head + shift));
buf->tail -= shift;
buf->read_end -= shift;
if (0 != buf->pro) {
buf->pro -= shift;
}
if (0 != buf->str) {
buf->str -= shift;
}
}
}
err = buf->read_func(buf);
*buf->read_end = '\0';
return err;
}
static VALUE
rescue_cb(VALUE rbuf, VALUE err) {
VALUE err_class = rb_obj_class(err);
if (err_class != rb_eTypeError && err_class != rb_eEOFError) {
Buf buf = (Buf)rbuf;
//ox_sax_drive_cleanup(buf->dr); called after exiting protect
rb_raise(err, "at line %d, column %d\n", buf->line, buf->col);
}
return Qfalse;
}
static VALUE
partial_io_cb(VALUE rbuf) {
Buf buf = (Buf)rbuf;
VALUE args[1];
VALUE rstr;
char *str;
size_t cnt;
args[0] = ULONG2NUM(buf->end - buf->tail);
rstr = rb_funcall2(buf->in.io, ox_readpartial_id, 1, args);
str = StringValuePtr(rstr);
cnt = strlen(str);
//printf("*** read partial %lu bytes, str: '%s'\n", cnt, str);
strcpy(buf->tail, str);
buf->read_end = buf->tail + cnt;
return Qtrue;
}
static VALUE
io_cb(VALUE rbuf) {
Buf buf = (Buf)rbuf;
VALUE args[1];
VALUE rstr;
char *str;
size_t cnt;
args[0] = ULONG2NUM(buf->end - buf->tail);
rstr = rb_funcall2(buf->in.io, ox_read_id, 1, args);
str = StringValuePtr(rstr);
cnt = strlen(str);
//printf("*** read %lu bytes, str: '%s'\n", cnt, str);
strcpy(buf->tail, str);
buf->read_end = buf->tail + cnt;
return Qtrue;
}
static int
read_from_io_partial(Buf buf) {
return (Qfalse == rb_rescue(partial_io_cb, (VALUE)buf, rescue_cb, (VALUE)buf));
}
static int
read_from_io(Buf buf) {
return (Qfalse == rb_rescue(io_cb, (VALUE)buf, rescue_cb, (VALUE)buf));
}
static int
read_from_fd(Buf buf) {
ssize_t cnt;
size_t max = buf->end - buf->tail;
cnt = read(buf->in.fd, buf->tail, max);
if (cnt < 0) {
ox_sax_drive_error(buf->dr, "failed to read from file");
return -1;
} else if (0 != cnt) {
buf->read_end = buf->tail + cnt;
}
return 0;
}
static char*
ox_stpncpy(char *dest, const char *src, size_t n) {
size_t cnt = strlen(src) + 1;
if (n < cnt) {
cnt = n;
}
strncpy(dest, src, cnt);
return dest + cnt - 1;
}
static int
read_from_str(Buf buf) {
size_t max = buf->end - buf->tail - 1;
char *s;
long cnt;
if ('\0' == *buf->in.str) {
/* done */
return -1;
}
s = ox_stpncpy(buf->tail, buf->in.str, max);
*s = '\0';
cnt = s - buf->tail;
buf->in.str += cnt;
buf->read_end = buf->tail + cnt;
return 0;
}
ox-2.8.2/ext/ox/base64.c 0000644 0000041 0000041 00000005765 13203413063 014667 0 ustar www-data www-data /* base64.c
* Copyright (c) 2011, Peter Ohler
* All rights reserved.
*/
#include
#include
#include "base64.h"
static char digits[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
/* invalid or terminating characters are set to 'X' or \x58 */
static uchar s_digits[256] = "\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x3E\x58\x58\x58\x3F\
\x34\x35\x36\x37\x38\x39\x3A\x3B\x3C\x3D\x58\x58\x58\x58\x58\x58\
\x58\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\
\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x58\x58\x58\x58\x58\
\x58\x1A\x1B\x1C\x1D\x1E\x1F\x20\x21\x22\x23\x24\x25\x26\x27\x28\
\x29\x2A\x2B\x2C\x2D\x2E\x2F\x30\x31\x32\x33\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\
\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58\x58";
void
to_base64(const uchar *src, int len, char *b64) {
const uchar *end3;
int len3 = len % 3;
uchar b1, b2, b3;
end3 = src + (len - len3);
while (src < end3) {
b1 = *src++;
b2 = *src++;
b3 = *src++;
*b64++ = digits[(uchar)(b1 >> 2)];
*b64++ = digits[(uchar)(((b1 & 0x03) << 4) | (b2 >> 4))];
*b64++ = digits[(uchar)(((b2 & 0x0F) << 2) | (b3 >> 6))];
*b64++ = digits[(uchar)(b3 & 0x3F)];
}
if (1 == len3) {
b1 = *src++;
*b64++ = digits[b1 >> 2];
*b64++ = digits[(b1 & 0x03) << 4];
*b64++ = '=';
*b64++ = '=';
} else if (2 == len3) {
b1 = *src++;
b2 = *src++;
*b64++ = digits[b1 >> 2];
*b64++ = digits[((b1 & 0x03) << 4) | (b2 >> 4)];
*b64++ = digits[(b2 & 0x0F) << 2];
*b64++ = '=';
}
*b64 = '\0';
}
unsigned long
b64_orig_size(const char *text) {
const char *start = text;
unsigned long size = 0;
if ('\0' != *text) {
for (; 0 != *text; text++) { }
size = (text - start) * 3 / 4;
text--;
if ('=' == *text) {
size--;
text--;
if ('=' == *text) {
size--;
}
}
}
return size;
}
void
from_base64(const char *b64, uchar *str) {
uchar b0, b1, b2, b3;
while (1) {
if ('X' == (b0 = s_digits[(uchar)*b64++])) { break; }
if ('X' == (b1 = s_digits[(uchar)*b64++])) { break; }
*str++ = (b0 << 2) | ((b1 >> 4) & 0x03);
if ('X' == (b2 = s_digits[(uchar)*b64++])) { break; }
*str++ = (b1 << 4) | ((b2 >> 2) & 0x0F);
if ('X' == (b3 = s_digits[(uchar)*b64++])) { break; }
*str++ = (b2 << 6) | b3;
}
*str = '\0';
}
ox-2.8.2/README.md 0000644 0000041 0000041 00000017310 13203413063 013455 0 ustar www-data www-data # Ox gem
A fast XML parser and Object marshaller as a Ruby gem.
## Installation
gem install ox
## Documentation
*Documentation*: http://www.ohler.com/ox
## Source
*GitHub* *repo*: https://github.com/ohler55/ox
*RubyGems* *repo*: https://rubygems.org/gems/ox
## Follow @oxgem on Twitter
[Follow @peterohler on Twitter](http://twitter.com/#!/peterohler) for announcements and news about the Ox gem.
## Build Status
[](http://travis-ci.org/ohler55/ox)
## Donate
[](https://gratipay.com/ox/)
## Links of Interest
[Ruby XML Gem Comparison](http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html) for a performance comparison between Ox, Nokogiri, and LibXML.
[Fast Ruby XML Serialization](http://www.ohler.com/dev/ruby_object_xml_serialization/ruby_object_xml_serialization.html) to see how Ox can be used as a faster replacement for Marshal.
*Fast JSON parser and marshaller on RubyGems*: https://rubygems.org/gems/oj
*Fast JSON parser and marshaller on GitHub*: https://github.com/ohler55/oj
## Release Notes
See [CHANGELOG.md](CHANGELOG.md)
## Description
Optimized XML (Ox), as the name implies was written to provide speed optimized
XML and now HTML handling. It was designed to be an alternative to Nokogiri and other Ruby
XML parsers in generic XML parsing and as an alternative to Marshal for Object
serialization.
Unlike some other Ruby XML parsers, Ox is self contained. Ox uses nothing
other than standard C libraries so version issues with libXml are not an
issue.
Marshal uses a binary format for serializing Objects. That binary format
changes with releases making Marshal dumped Object incompatible between some
versions. The use of a binary format make debugging message streams or file
contents next to impossible unless the same version of Ruby and only Ruby is
used for inspecting the serialize Object. Ox on the other hand uses human
readable XML. Ox also includes options that allow strict, tolerant, or a mode
that automatically defines missing classes.
It is possible to write an XML serialization gem with Nokogiri or other XML
parsers but writing such a package in Ruby results in a module significantly
slower than Marshal. This is what triggered the start of Ox development.
Ox handles XML documents in three ways. It is a generic XML parser and writer,
a fast Object / XML marshaller, and a stream SAX parser. Ox was written for
speed as a replacement for Nokogiri, Ruby LibXML, and for Marshal.
As an XML parser it is 2 or more times faster than Nokogiri and as a generic
XML writer it is as much as 20 times faster than Nokogiri. Of course different
files may result in slightly different times.
As an Object serializer Ox is up to 6 times faster than the standard Ruby
Marshal.dump() and up to 3 times faster than Marshal.load().
The SAX like stream parser is 40 times faster than Nokogiri and more than 13
times faster than LibXML when validating a file with minimal Ruby
callbacks. Unlike Nokogiri and LibXML, Ox can be tuned to use only the SAX
callbacks that are of interest to the caller. (See the perf_sax.rb file for an
example.)
Ox is compatible with Ruby 1.8.7, 1.9.3, 2.1.2, 2.2.0 and RBX.
### Object Dump Sample:
```ruby
require 'ox'
class Sample
attr_accessor :a, :b, :c
def initialize(a, b, c)
@a = a
@b = b
@c = c
end
end
# Create Object
obj = Sample.new(1, "bee", ['x', :y, 7.0])
# Now dump the Object to an XML String.
xml = Ox.dump(obj)
# Convert the object back into a Sample Object.
obj2 = Ox.parse_obj(xml)
```
### Generic XML Writing and Parsing:
```ruby
require 'ox'
doc = Ox::Document.new(:version => '1.0')
top = Ox::Element.new('top')
top[:name] = 'sample'
doc << top
mid = Ox::Element.new('middle')
mid[:name] = 'second'
top << mid
bot = Ox::Element.new('bottom')
bot[:name] = 'third'
mid << bot
xml = Ox.dump(doc)
# xml =
#
#
#
#
#
doc2 = Ox.parse(xml)
puts "Same? #{doc == doc2}"
# true
```
### HTML Parsing:
Ox can be used to parse HTML with a few options changes. HTML is often loose in
regard to conformance. For HTML parsing try these options.
```ruby
Ox.default_options = {
mode: :generic,
effort: :tolerant,
smart: true
}
```
### SAX XML Parsing:
```ruby
require 'stringio'
require 'ox'
class Sample < ::Ox::Sax
def start_element(name); puts "start: #{name}"; end
def end_element(name); puts "end: #{name}"; end
def attr(name, value); puts " #{name} => #{value}"; end
def text(value); puts "text #{value}"; end
end
io = StringIO.new(%{
})
handler = Sample.new()
Ox.sax_parse(handler, io)
# outputs
# start: top
# name => sample
# start: middle
# name => second
# start: bottom
# name => third
# end: bottom
# end: middle
# end: top
```
### Yielding results immediately while SAX XML Parsing:
```ruby
require 'stringio'
require 'ox'
class Yielder < ::Ox::Sax
def initialize(block); @yield_to = block; end
def start_element(name); @yield_to.call(name); end
end
io = StringIO.new(%{
})
proc = Proc.new { |name| puts name }
handler = Yielder.new(proc)
puts "before parse"
Ox.sax_parse(handler, io)
puts "after parse"
# outputs
# before parse
# top
# middle
# bottom
# after parse
```
### Object XML format
The XML format used for Object encoding follows the structure of the
Object. Each XML element is encoded so that the XML element name is a type
indicator. Attributes of the element provide additional information such as
the Class if relevant, the Object attribute name, and Object ID if
necessary.
The type indicator map is:
- **a** => `Array`
- **b** => `Base64`
- **c** => `Class`
- **f** => `Float`
- **g** => `Regexp`
- **h** => `Hash`
- **i** => `Fixnum`
- **j** => `Bignum`
- **l** => `Rational`
- **m** => `Symbol`
- **n** => `FalseClass`
- **o** => `Object`
- **p** => `Ref`
- **r** => `Range`
- **s** => `String`
- **t** => `Time`
- **u** => `Struct`
- **v** => `Complex`
- **x** => `Raw`
- **y** => `TrueClass`
- **z** => `NilClass`
If the type is an Object, type 'o' then an attribute named 'c' should be set
with the full Class name including the Module names. If the XML element
represents an Object then a sub-elements is included for each attribute of
the Object. An XML element attribute 'a' is set with a value that is the
name of the Ruby Object attribute. In all cases, except for the Exception
attribute hack the attribute names begin with an @ character. (Exception are
strange in that the attributes of the Exception Class are not named with a @
suffix. A hack since it has to be done in C and can not be done through the
interpreter.)
Values are encoded as the text portion of an element or in the sub-elements
of the principle. For example, a Fixnum is encoded as:
```xml
123
```
An Array has sub-elements and is encoded similar to this example.
```xml
1abc
```
A Hash is encoded with an even number of elements where the first element is
the key and the second is the value. This is repeated for each entry in the
Hash. An example is of { 1 => 'one', 2 => 'two' } encoding is:
```xml
1one2two
```
Strings with characters not allowed in XML are base64 encoded amd will be
converted back into a String when loaded.
Ox supports circular references where attributes of one Object can refer to
an Object that refers back to the first Object. When this option is used an
Object ID is added to each XML Object element as the value of the 'a'
attribute.