XML-XSLT-0.48/0040700000076500007650000000000010015344303013060 5ustar jonathanjonathanXML-XSLT-0.48/examples/0040700000076500007650000000000010015344303014676 5ustar jonathanjonathanXML-XSLT-0.48/examples/test.dtd0100644000076500007650000000174507115344403016400 0ustar jonathanjonathan XML-XSLT-0.48/examples/xmlspec.xsl0100644000076500007650000004404207132350446017127 0ustar jonathanjonathan ]> http://www.w3.org/ code { font-family: monospace } <xsl:value-of select="header/title"/>
This version:

Latest version:
Previous version s :
Editor s :

( ) < >

Abstract

Status of this document

Table of contents



                        
                            color: red
                        
			
		

Ed. Note:

Issue ():

NOTE:
   ::=    /* */ [ VC:   ] [ WFC:   ]
[ ] [ ]

Validity Constraint:

Well Formedness Constraint:

; , ( )

        
            

    Appendices


        
            
    (Non-Normative) " "
    XML-XSLT-0.48/examples/cml.xsl0100644000076500007650000000207307115344402016221 0ustar jonathanjonathan

    Physical Properties

    Chemical:

    :
    :
    XML-XSLT-0.48/examples/91-22-5.xml0100644000076500007650000000113207115344336016255 0ustar jonathanjonathan 237.1 -14.9 1.098 2.2 C9H7N 129.2 XML-XSLT-0.48/examples/test.xsl0100644000076500007650000000752207115344403016432 0ustar jonathanjonathan <xsl:value-of select="@NAME"/> Agenda Maand naam: Maand=""


















    XML-XSLT-0.48/examples/identity.xsl0100644000076500007650000000046207115344403017300 0ustar jonathanjonathan XML-XSLT-0.48/examples/agenda.xsl0100644000076500007650000000740407115344402016670 0ustar jonathanjonathan AGENDA <xsl:value-of select="@NAME"/> Agenda /MAAND
    /AGENDA
    MAAND

    /PUNT
    PUNT

    /INFO
    INFO




    XML-XSLT-0.48/examples/95-48-7.xsl0100644000076500007650000000155707115344336016314 0ustar jonathanjonathan version="1.0" encoding="ISO-8859-1" ]]> XML-XSLT-0.48/examples/test2.xml0100644000076500007650000000401607120112224016467 0ustar jonathanjonathan hoi piepeloi! Dit is wat test tekst... Nieuwjaarsborrel 4/1/1999 Subfaculteit Scheikunde kantine B-faculteit 16.30 Informed Chemistry: what can it do for synthesis? 13/1/1999 Chemweb.Com Internet 16.00 "Nieuwe materialen op basis van organische synthese" 2/2/1999 NSR Spreker: dr. Frank van Veggel, Laboratorium voor organische chemie, Universiteit Twente
    Gastheer: Prof. dr. RJM Nolte
    CZ I 14.00
    Paaslympics 5/4/1999 St. Beet en BeeVee W en N Paas-Beestborrel 6/4/1999 BBB en Leonardo X-Files: Fight the Future 6/4/1999 St. Beet CZ N2 19.30u 1,50 Geert Josten!?!?
    XML-XSLT-0.48/examples/91-22-5.cml0100644000076500007650000000070007115344336016230 0ustar jonathanjonathan C9H7N 129.2 237.1 -14.9 1.098 2.2 XML-XSLT-0.48/examples/REC-xslt-19991116.xml0100644000076500007650000074336307132350446020005 0ustar jonathanjonathan ]>
    XSL Transformations (XSLT) Version 1.0 &LEV;-xslt-&YYYYMMDD; W3C Recommendation &day;&month;&year; http://www.w3.org/TR/&year;/&LEV;-xslt-&YYYYMMDD; XML HTML http://www.w3.org/TR/xslt http://www.w3.org/TR/1999/PR-xslt-19991008 http://www.w3.org/1999/08/WD-xslt-19990813 http://www.w3.org/1999/07/WD-xslt-19990709 http://www.w3.org/TR/1999/WD-xslt-19990421 http://www.w3.org/TR/1998/WD-xsl-19981216 http://www.w3.org/TR/1998/WD-xsl-19980818 James Clark jjc@jclark.com

    This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from other documents. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

    The list of known errors in this specification is available at http://www.w3.org/&year;/&MM;/&LEV;-xslt-&YYYYMMDD;-errata.

    Comments on this specification may be sent to xsl-editors@w3.org; archives of the comments are available. Public discussion of XSL, including XSL Transformations, takes place on the XSL-List mailing list.

    The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/Style/XSL/translations.html.

    A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

    This specification has been produced as part of the W3C Style activity.

    This specification defines the syntax and semantics of XSLT, which is a language for transforming XML documents into other XML documents.

    XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.

    XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL.

    English EBNF See RCS log for revision history.
    Introduction

    This specification defines the syntax and semantics of the XSLT language. A transformation in the XSLT language is expressed as a well-formed XML document conforming to the Namespaces in XML Recommendation , which may include both elements that are defined by XSLT and elements that are not defined by XSLT. XSLT-defined elements are distinguished by belonging to a specific XML namespace (see ), which is referred to in this specification as the XSLT namespace. Thus this specification is a definition of the syntax and semantics of the XSLT namespace.

    A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree. The result tree is separate from the source tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added.

    A transformation expressed in XSLT is called a stylesheet. This is because, in the case when XSLT is transforming into the XSL formatting vocabulary, the transformation functions as a stylesheet.

    This document does not specify how an XSLT stylesheet is associated with an XML document. It is recommended that XSL processors support the mechanism described in . When this or any other mechanism yields a sequence of more than one XSLT stylesheet to be applied simultaneously to a XML document, then the effect should be the same as applying a single stylesheet that imports each member of the sequence in order (see ).

    A stylesheet contains a set of template rules. A template rule has two parts: a pattern which is matched against nodes in the source tree and a template which can be instantiated to form part of the result tree. This allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures.

    A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. A template can also contain elements from the XSLT namespace that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template. Note that elements are only processed when they have been selected by the execution of an instruction. The result tree is constructed by finding the template rule for the root node and instantiating its template.

    In the process of finding the applicable template rule, more than one template rule may have a pattern that matches a given element. However, only one template rule will be applied. The method for deciding which template rule to apply is described in .

    A single template by itself has considerable power: it can create structures of arbitrary complexity; it can pull string values out of arbitrary locations in the source tree; it can generate structures that are repeated according to the occurrence of elements in the source tree. For simple transformations where the structure of the result tree is independent of the structure of the source tree, a stylesheet can often consist of only a single template, which functions as a template for the complete result tree. Transformations on XML documents that represent data are often of this kind (see ). XSLT allows a simplified syntax for such stylesheets (see ).

    When a template is instantiated, it is always instantiated with respect to a current node and a current node list. The current node is always a member of the current node list. Many operations in XSLT are relative to the current node. Only a few instructions change the current node list or the current node (see and ); during the instantiation of one of these instructions, the current node list changes to a new list of nodes and each member of this new list becomes the current node in turn; after the instantiation of the instruction is complete, the current node and current node list revert to what they were before the instruction was instantiated.

    XSLT makes use of the expression language defined by for selecting elements for processing, for conditional processing and for generating text.

    XSLT provides two hooks for extending the language, one hook for extending the set of instruction elements used in templates and one hook for extending the set of functions used in XPath expressions. These hooks are both based on XML namespaces. This version of XSLT does not define a mechanism for implementing the hooks. See .

    The XSL WG intends to define such a mechanism in a future version of this specification or in a separate specification.

    The element syntax summary notation used to describe the syntax of XSLT-defined elements is described in .

    The MIME media types text/xml and application/xml should be used for XSLT stylesheets. It is possible that a media type will be registered specifically for XSLT stylesheets; if and when it is, that media type may also be used.

    Stylesheet Structure XSLT Namespace

    The XSLT namespace has the URI &XSLT.ns;.

    The 1999 in the URI indicates the year in which the URI was allocated by the W3C. It does not indicate the version of XSLT being used, which is specified by attributes (see and ).

    XSLT processors must use the XML namespaces mechanism to recognize elements and attributes from this namespace. Elements from the XSLT namespace are recognized only in the stylesheet not in the source document. The complete list of XSLT-defined elements is specified in . Vendors must not extend the XSLT namespace with additional elements or attributes. Instead, any extension must be in a separate namespace. Any namespace that is used for additional instruction elements must be identified by means of the extension element mechanism specified in .

    This specification uses a prefix of xsl: for referring to elements in the XSLT namespace. However, XSLT stylesheets are free to use any prefix, provided that there is a namespace declaration that binds the prefix to the URI of the XSLT namespace.

    An element from the XSLT namespace may have any attribute not from the XSLT namespace, provided that the expanded-name of the attribute has a non-null namespace URI. The presence of such attributes must not change the behavior of XSLT elements and functions defined in this document. Thus, an XSLT processor is always free to ignore such attributes, and must ignore such attributes without giving an error if it does not recognize the namespace URI. Such attributes can provide, for example, unique identifiers, optimization hints, or documentation.

    It is an error for an element from the XSLT namespace to have attributes with expanded-names that have null namespace URIs (i.e. attributes with unprefixed names) other than attributes defined for the element in this document.

    The conventions used for the names of XSLT elements, attributes and functions are that names are all lower-case, use hyphens to separate words, and use abbreviations only if they already appear in the syntax of a related language such as XML or HTML.

    Stylesheet Element

    A stylesheet is represented by an xsl:stylesheet element in an XML document. xsl:transform is allowed as a synonym for xsl:stylesheet.

    An xsl:stylesheet element must have a version attribute, indicating the version of XSLT that the stylesheet requires. For this version of XSLT, the value should be 1.0. When the value is not equal to 1.0, forwards-compatible processing mode is enabled (see ).

    The xsl:stylesheet element may contain the following types of elements:

    xsl:import

    xsl:include

    xsl:strip-space

    xsl:preserve-space

    xsl:output

    xsl:key

    xsl:decimal-format

    xsl:namespace-alias

    xsl:attribute-set

    xsl:variable

    xsl:param

    xsl:template

    An element occurring as a child of an xsl:stylesheet element is called a top-level element.

    This example shows the structure of a stylesheet. Ellipses (...) indicate where attribute values or content have been omitted. Although this example shows one of each type of allowed element, stylesheets may contain zero or more of each of these elements.

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;"> ... ... ... ... ... ]]>

    The order in which the children of the xsl:stylesheet element occur is not significant except for xsl:import elements and for error recovery. Users are free to order the elements as they prefer, and stylesheet creation tools need not provide control over the order in which the elements occur.

    In addition, the xsl:stylesheet element may contain any element not from the XSLT namespace, provided that the expanded-name of the element has a non-null namespace URI. The presence of such top-level elements must not change the behavior of XSLT elements and functions defined in this document; for example, it would not be permitted for such a top-level element to specify that xsl:apply-templates was to use different rules to resolve conflicts. Thus, an XSLT processor is always free to ignore such top-level elements, and must ignore a top-level element without giving an error if it does not recognize the namespace URI. Such elements can provide, for example,

    information used by extension elements or extension functions (see ),

    information about what to do with the result tree,

    information about how to obtain the source tree,

    metadata about the stylesheet,

    structured documentation for the stylesheet.

    Literal Result Element as Stylesheet

    A simplified syntax is allowed for stylesheets that consist of only a single template for the root node. The stylesheet may consist of just a literal result element (see ). Such a stylesheet is equivalent to a stylesheet with an xsl:stylesheet element containing a template rule containing the literal result element; the template rule has a match pattern of /. For example

    <html xsl:version="1.0" xmlns:xsl="&XSLT.ns;" xmlns="&XHTML.ns;"> Expense Report Summary

    Total Amount:

    ]]>

    has the same meaning as

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;" xmlns="&XHTML.ns;"> Expense Report Summary

    Total Amount:

    ]]>

    A literal result element that is the document element of a stylesheet must have an xsl:version attribute, which indicates the version of XSLT that the stylesheet requires. For this version of XSLT, the value should be 1.0; the value must be a Number. Other literal result elements may also have an xsl:version attribute. When the xsl:version attribute is not equal to 1.0, forwards-compatible processing mode is enabled (see ).

    The allowed content of a literal result element when used as a stylesheet is no different from when it occurs within a stylesheet. Thus, a literal result element used as a stylesheet cannot contain top-level elements.

    In some situations, the only way that a system can recognize that an XML document needs to be processed by an XSLT processor as an XSLT stylesheet is by examining the XML document itself. Using the simplified syntax makes this harder.

    For example, another XML language (AXL) might also use an axl:version on the document element to indicate that an XML document was an AXL document that required processing by an AXL processor; if a document had both an axl:version attribute and an xsl:version attribute, it would be unclear whether the document should be processed by an XSLT processor or an AXL processor.

    Therefore, the simplified syntax should not be used for XSLT stylesheets that may be used in such a situation. This situation can, for example, arise when an XSLT stylesheet is transmitted as a message with a MIME media type of text/xml or application/xml to a recipient that will use the MIME media type to determine how the message is processed.

    Qualified Names

    The name of an internal XSLT object, specifically a named template (see ), a mode (see ), an attribute set (see ), a key (see ), a decimal-format (see ), a variable or a parameter (see ) is specified as a QName. If it has a prefix, then the prefix is expanded into a URI reference using the namespace declarations in effect on the attribute in which the name occurs. The expanded-name consisting of the local part of the name and the possibly null URI reference is used as the name of the object. The default namespace is not used for unprefixed names.

    Forwards-Compatible Processing

    An element enables forwards-compatible mode for itself, its attributes, its descendants and their attributes if either it is an xsl:stylesheet element whose version attribute is not equal to 1.0, or it is a literal result element that has an xsl:version attribute whose value is not equal to 1.0, or it is a literal result element that does not have an xsl:version attribute and that is the document element of a stylesheet using the simplified syntax (see ). A literal result element that has an xsl:version attribute whose value is equal to 1.0 disables forwards-compatible mode for itself, its attributes, its descendants and their attributes.

    If an element is processed in forwards-compatible mode, then:

    if it is a top-level element and XSLT 1.0 does not allow such elements as top-level elements, then the element must be ignored along with its content;

    if it is an element in a template and XSLT 1.0 does not allow such elements to occur in templates, then if the element is not instantiated, an error must not be signaled, and if the element is instantiated, the XSLT must perform fallback for the element as specified in ;

    if the element has an attribute that XSLT 1.0 does not allow the element to have or if the element has an optional attribute with a value that the XSLT 1.0 does not allow the attribute to have, then the attribute must be ignored.

    Thus, any XSLT 1.0 processor must be able to process the following stylesheet without error, although the stylesheet includes elements from the XSLT namespace that are not defined in this specification:

    <xsl:stylesheet version="1.1" xmlns:xsl="&XSLT.ns;"> XSLT 1.1 required

    Sorry, this stylesheet requires XSLT 1.1.

    ]]>

    If a stylesheet depends crucially on a top-level element introduced by a version of XSL after 1.0, then the stylesheet can use an xsl:message element with terminate="yes" (see ) to ensure that XSLT processors implementing earlier versions of XSL will not silently ignore the top-level element. For example,

    <xsl:stylesheet version="1.5" xmlns:xsl="&XSLT.ns;"> Sorry, this stylesheet requires XSLT 1.1. ... ... ]]>

    If an expression occurs in an attribute that is processed in forwards-compatible mode, then an XSLT processor must recover from errors in the expression as follows:

    if the expression does not match the syntax allowed by the XPath grammar, then an error must not be signaled unless the expression is actually evaluated;

    if the expression calls a function with an unprefixed name that is not part of the XSLT library, then an error must not be signaled unless the function is actually called;

    if the expression calls a function with a number of arguments that XSLT does not allow or with arguments of types that XSLT does not allow, then an error must not be signaled unless the function is actually called.

    Combining Stylesheets

    XSLT provides two mechanisms to combine stylesheets:

    an inclusion mechanism that allows stylesheets to be combined without changing the semantics of the stylesheets being combined, and an import mechanism that allows stylesheets to override each other. Stylesheet Inclusion

    An XSLT stylesheet may include another XSLT stylesheet using an xsl:include element. The xsl:include element has an href attribute whose value is a URI reference identifying the stylesheet to be included. A relative URI is resolved relative to the base URI of the xsl:include element (see ).

    The xsl:include element is only allowed as a top-level element.

    The inclusion works at the XML tree level. The resource located by the href attribute value is parsed as an XML document, and the children of the xsl:stylesheet element in this document replace the xsl:include element in the including document. The fact that template rules or definitions are included does not affect the way they are processed.

    The included stylesheet may use the simplified syntax described in . The included stylesheet is treated the same as the equivalent xsl:stylesheet element.

    It is an error if a stylesheet directly or indirectly includes itself.

    Including a stylesheet multiple times can cause errors because of duplicate definitions. Such multiple inclusions are less obvious when they are indirect. For example, if stylesheet B includes stylesheet A, stylesheet C includes stylesheet A, and stylesheet D includes both stylesheet B and stylesheet C, then A will be included indirectly by D twice. If all of B, C and D are used as independent stylesheets, then the error can be avoided by separating everything in B other than the inclusion of A into a separate stylesheet B' and changing B to contain just inclusions of B' and A, similarly for C, and then changing D to include A, B', C'.

    Stylesheet Import

    An XSLT stylesheet may import another XSLT stylesheet using an xsl:import element. Importing a stylesheet is the same as including it (see ) except that definitions and template rules in the importing stylesheet take precedence over template rules and definitions in the imported stylesheet; this is described in more detail below. The xsl:import element has an href attribute whose value is a URI reference identifying the stylesheet to be imported. A relative URI is resolved relative to the base URI of the xsl:import element (see ).

    The xsl:import element is only allowed as a top-level element. The xsl:import element children must precede all other element children of an xsl:stylesheet element, including any xsl:include element children. When xsl:include is used to include a stylesheet, any xsl:import elements in the included document are moved up in the including document to after any existing xsl:import elements in the including document.

    For example,

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;"> italic ]]>

    The xsl:stylesheet elements encountered during processing of a stylesheet that contains xsl:import elements are treated as forming an import tree. In the import tree, each xsl:stylesheet element has one import child for each xsl:import element that it contains. Any xsl:include elements are resolved before constructing the import tree. An xsl:stylesheet element in the import tree is defined to have lower import precedence than another xsl:stylesheet element in the import tree if it would be visited before that xsl:stylesheet element in a post-order traversal of the import tree (i.e. a traversal of the import tree in which an xsl:stylesheet element is visited after its import children). Each definition and template rule has import precedence determined by the xsl:stylesheet element that contains it.

    For example, suppose

    stylesheet A imports stylesheets B and C in that order;

    stylesheet B imports stylesheet D;

    stylesheet C imports stylesheet E.

    Then the order of import precedence (lowest first) is D, B, E, C, A.

    Since xsl:import elements are required to occur before any definitions or template rules, an implementation that processes imported stylesheets at the point at which it encounters the xsl:import element will encounter definitions and template rules in increasing order of import precedence.

    In general, a definition or template rule with higher import precedence takes precedence over a definition or template rule with lower import precedence. This is defined in detail for each kind of definition and for template rules.

    It is an error if a stylesheet directly or indirectly imports itself. Apart from this, the case where a stylesheet with a particular URI is imported in multiple places is not treated specially. The import tree will have a separate xsl:stylesheet for each place that it is imported.

    If xsl:apply-imports is used (see ), the behavior may be different from the behavior if the stylesheet had been imported only at the place with the highest import precedence.

    Embedding Stylesheets

    Normally an XSLT stylesheet is a complete XML document with the xsl:stylesheet element as the document element. However, an XSLT stylesheet may also be embedded in another resource. Two forms of embedding are possible:

    the XSLT stylesheet may be textually embedded in a non-XML resource, or the xsl:stylesheet element may occur in an XML document other than as the document element.

    To facilitate the second form of embedding, the xsl:stylesheet element is allowed to have an ID attribute that specifies a unique identifier.

    In order for such an attribute to be used with the XPath id function, it must actually be declared in the DTD as being an ID.

    The following example shows how the xml-stylesheet processing instruction can be used to allow a document to contain its own stylesheet. The URI reference uses a relative URI with a fragment identifier to locate the xsl:stylesheet element:

    xmlns:xsl="&XSLT.ns;" xmlns:fo="&XSLFO.ns;"> ... ]]>

    A stylesheet that is embedded in the document to which it is to be applied or that may be included or imported into an stylesheet that is so embedded typically needs to contain a template rule that specifies that xsl:stylesheet elements are to be ignored.

    Data Model

    The data model used by XSLT is the same as that used by XPath with the additions described in this section. XSLT operates on source, result and stylesheet documents using the same data model. Any two XML documents that have the same tree will be treated the same by XSLT.

    Processing instructions and comments in the stylesheet are ignored: the stylesheet is treated as if neither processing instruction nodes nor comment nodes were included in the tree that represents the stylesheet.

    Root Node Children

    The normal restrictions on the children of the root node are relaxed for the result tree. The result tree may have any sequence of nodes as children that would be possible for an element node. In particular, it may have text node children, and any number of element node children. When written out using the XML output method (see ), it is possible that a result tree will not be a well-formed XML document; however, it will always be a well-formed external general parsed entity.

    When the source tree is created by parsing a well-formed XML document, the root node of the source tree will automatically satisfy the normal restrictions of having no text node children and exactly one element child. When the source tree is created in some other way, for example by using the DOM, the usual restrictions are relaxed for the source tree as for the result tree.

    Base URI

    Every node also has an associated URI called its base URI, which is used for resolving attribute values that represent relative URIs into absolute URIs. If an element or processing instruction occurs in an external entity, the base URI of that element or processing instruction is the URI of the external entity; otherwise, the base URI is the base URI of the document. The base URI of the document node is the URI of the document entity. The base URI for a text node, a comment node, an attribute node or a namespace node is the base URI of the parent of the node.

    Unparsed Entities

    The root node has a mapping that gives the URI for each unparsed entity declared in the document's DTD. The URI is generated from the system identifier and public identifier specified in the entity declaration. The XSLT processor may use the public identifier to generate a URI for the entity instead of the URI specified in the system identifier. If the XSLT processor does not use the public identifier to generate the URI, it must use the system identifier; if the system identifier is a relative URI, it must be resolved into an absolute URI using the URI of the resource containing the entity declaration as the base URI .

    Whitespace Stripping

    After the tree for a source document or stylesheet document has been constructed, but before it is otherwise processed by XSLT, some text nodes are stripped. A text node is never stripped unless it contains only whitespace characters. Stripping the text node removes the text node from the tree. The stripping process takes as input a set of element names for which whitespace must be preserved. The stripping process is applied to both stylesheets and source documents, but the set of whitespace-preserving element names is determined differently for stylesheets and for source documents.

    A text node is preserved if any of the following apply:

    The element name of the parent of the text node is in the set of whitespace-preserving element names.

    The text node contains at least one non-whitespace character. As in XML, a whitespace character is #x20, #x9, #xD or #xA.

    An ancestor element of the text node has an xml:space attribute with a value of preserve, and no closer ancestor element has xml:space with a value of default.

    Otherwise, the text node is stripped.

    The xml:space attributes are not stripped from the tree.

    This implies that if an xml:space attribute is specified on a literal result element, it will be included in the result.

    For stylesheets, the set of whitespace-preserving element names consists of just xsl:text.

    For source documents, the set of whitespace-preserving element names is specified by xsl:strip-space and xsl:preserve-space top-level elements. These elements each have an elements attribute whose value is a whitespace-separated list of NameTests. Initially, the set of whitespace-preserving element names contains all element names. If an element name matches a NameTest in an xsl:strip-space element, then it is removed from the set of whitespace-preserving element names. If an element name matches a NameTest in an xsl:preserve-space element, then it is added to the set of whitespace-preserving element names. An element matches a NameTest if and only if the NameTest would be true for the element as an XPath node test. Conflicts between matches to xsl:strip-space and xsl:preserve-space elements are resolved the same way as conflicts between template rules (see ). Thus, the applicable match for a particular element name is determined as follows:

    First, any match with lower import precedence than another match is ignored.

    Next, any match with a NameTest that has a lower default priority than the default priority of the NameTest of another match is ignored.

    It is an error if this leaves more than one match. An XSLT processor may signal the error; if it does not signal the error, it must recover by choosing, from amongst the matches that are left, the one that occurs last in the stylesheet.

    Expressions

    XSLT uses the expression language defined by XPath . Expressions are used in XSLT for a variety of purposes including:

    selecting nodes for processing; specifying conditions for different ways of processing a node; generating text to be inserted in the result tree.

    An expression must match the XPath production Expr.

    Expressions occur as the value of certain attributes on XSLT-defined elements and within curly braces in attribute value templates.

    In XSLT, an outermost expression (i.e. an expression that is not part of another expression) gets its context as follows:

    the context node comes from the current node

    the context position comes from the position of the current node in the current node list; the first position is 1

    the context size comes from the size of the current node list

    the variable bindings are the bindings in scope on the element which has the attribute in which the expression occurs (see )

    the set of namespace declarations are those in scope on the element which has the attribute in which the expression occurs; this includes the implicit declaration of the prefix xml required by the the XML Namespaces Recommendation ; the default namespace (as declared by xmlns) is not part of this set

    the function library consists of the core function library together with the additional functions defined in and extension functions as described in ; it is an error for an expression to include a call to any other function

    Template Rules Processing Model

    A list of source nodes is processed to create a result tree fragment. The result tree is constructed by processing a list containing just the root node. A list of source nodes is processed by appending the result tree structure created by processing each of the members of the list in order. A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them; the chosen rule's template is then instantiated with the node as the current node and with the list of source nodes as the current node list. A template typically contains instructions that select an additional list of source nodes for processing. The process of matching, instantiation and selection is continued recursively until no new source nodes are selected for processing.

    Implementations are free to process the source document in any way that produces the same result as if it were processed using this processing model.

    Patterns

    Template rules identify the nodes to which they apply by using a pattern. As well as being used in template rules, patterns are used for numbering (see ) and for declaring keys (see ). A pattern specifies a set of conditions on a node. A node that satisfies the conditions matches the pattern; a node that does not satisfy the conditions does not match the pattern. The syntax for patterns is a subset of the syntax for expressions. In particular, location paths that meet certain restrictions can be used as patterns. An expression that is also a pattern always evaluates to an object of type node-set. A node matches a pattern if the node is a member of the result of evaluating the pattern as an expression with respect to some possible context; the possible contexts are those whose context node is the node being matched or one of its ancestors.

    Here are some examples of patterns:

    para matches any para element

    * matches any element

    chapter|appendix matches any chapter element and any appendix element

    olist/item matches any item element with an olist parent

    appendix//para matches any para element with an appendix ancestor element

    / matches the root node

    text() matches any text node

    processing-instruction() matches any processing instruction

    node() matches any node other than an attribute node and the root node

    id("W11") matches the element with unique ID W11

    para[1] matches any para element that is the first para child element of its parent

    *[position()=1 and self::para] matches any para element that is the first child element of its parent

    para[last()=1] matches any para element that is the only para child element of its parent

    items/item[position()>1] matches any item element that has a items parent and that is not the first item child of its parent

    item[position() mod 2 = 1] would be true for any item element that is an odd-numbered item child of its parent.

    div[@class="appendix"]//p matches any p element with a div ancestor element that has a class attribute with value appendix

    @class matches any class attribute (not any element that has a class attribute)

    @* matches any attribute

    A pattern must match the grammar for Pattern. A Pattern is a set of location path patterns separated by |. A location path pattern is a location path whose steps all use only the child or attribute axes. Although patterns must not use the descendant-or-self axis, patterns may use the // operator as well as the / operator. Location path patterns can also start with an id or key function call with a literal argument. Predicates in a pattern can use arbitrary expressions just like predicates in a location path.

    Patterns Pattern LocationPathPattern | Pattern '|' LocationPathPattern LocationPathPattern '/' RelativePathPattern? | IdKeyPattern (('/' | '//') RelativePathPattern)? | '//'? RelativePathPattern IdKeyPattern 'id' '(' Literal ')' | 'key' '(' Literal ',' Literal ')' RelativePathPattern StepPattern | RelativePathPattern '/' StepPattern | RelativePathPattern '//' StepPattern StepPattern ChildOrAttributeAxisSpecifier NodeTest Predicate* ChildOrAttributeAxisSpecifier AbbreviatedAxisSpecifier | ('child' | 'attribute') '::'

    A pattern is defined to match a node if and only if there is possible context such that when the pattern is evaluated as an expression with that context, the node is a member of the resulting node-set. When a node is being matched, the possible contexts have a context node that is the node being matched or any ancestor of that node, and a context node list containing just the context node.

    For example, p matches any p element, because for any p if the expression p is evaluated with the parent of the p element as context the resulting node-set will contain that p element as one of its members.

    This matches even a p element that is the document element, since the document root is the parent of the document element.

    Although the semantics of patterns are specified indirectly in terms of expression evaluation, it is easy to understand the meaning of a pattern directly without thinking in terms of expression evaluation. In a pattern, | indicates alternatives; a pattern with one or more | separated alternatives matches if any one of the alternative matches. A pattern that consists of a sequence of StepPatterns separated by / or // is matched from right to left. The pattern only matches if the rightmost StepPattern matches and a suitable element matches the rest of the pattern; if the separator is / then only the parent is a suitable element; if the separator is //, then any ancestor is a suitable element. A StepPattern that uses the child axis matches if the NodeTest is true for the node and the node is not an attribute node. A StepPattern that uses the attribute axis matches if the NodeTest is true for the node and the node is an attribute node. When [] is present, then the first PredicateExpr in a StepPattern is evaluated with the node being matched as the context node and the siblings of the context node that match the NodeTest as the context node list, unless the node being matched is an attribute node, in which case the context node list is all the attributes that have the same parent as the attribute being matched and that match the NameTest.

    For example

    appendix//ulist/item[position()=1]

    matches a node if and only if all of the following are true:

    the NodeTest item is true for the node and the node is not an attribute; in other words the node is an item element

    evaluating the PredicateExpr position()=1 with the node as context node and the siblings of the node that are item elements as the context node list yields true

    the node has a parent that matches appendix//ulist; this will be true if the parent is a ulist element that has an appendix ancestor element.

    Defining Template Rules

    A template rule is specified with the xsl:template element. The match attribute is a Pattern that identifies the source node or nodes to which the rule applies. The match attribute is required unless the xsl:template element has a name attribute (see ). It is an error for the value of the match attribute to contain a VariableReference. The content of the xsl:template element is the template that is instantiated when the template rule is applied.

    For example, an XML document might contain:

    important point.]]>

    The following template rule matches emph elements and produces a fo:inline-sequence formatting object with a font-weight property of bold.

    ]]>

    Examples in this document use the fo: prefix for the namespace &XSLFO.ns;, which is the namespace of the formatting objects defined in .

    As described next, the xsl:apply-templates element recursively processes the children of the source element.

    Applying Template Rules

    This example creates a block for a chapter element and then processes its immediate children.

    ]]>

    In the absence of a select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes. However, text nodes that have been stripped as specified in will not be processed. If stripping of whitespace nodes has not been enabled for an element, then all whitespace in the content of the element will be processed as text, and thus whitespace between child elements will count in determining the position of a child element as returned by the position function.

    A select attribute can be used to process nodes selected by an expression instead of processing all children. The value of the select attribute is an expression. The expression must evaluate to a node-set. The selected set of nodes is processed in document order, unless a sorting specification is present (see ). The following example processes all of the author children of the author-group:

    ]]>

    The following example processes all of the given-names of the authors that are children of author-group:

    ]]>

    This example processes all of the heading descendant elements of the book element.

    ]]>

    It is also possible to process elements that are not descendants of the current node. This example assumes that a department element has group children and employee descendants. It finds an employee's department and then processes the group children of the department.

    Employee belongs to group ]]>

    Multiple xsl:apply-templates elements can be used within a single template to do simple reordering. The following example creates two HTML tables. The first table is filled with domestic sales while the second table is filled with foreign sales.

    ]]>

    It is possible for there to be two matching descendants where one is a descendant of the other. This case is not treated specially: both descendants will be processed as usual. For example, given a source document

    ]]>

    the rule

    ]]>

    will process both the outer div and inner div elements.

    Typically, xsl:apply-templates is used to process only nodes that are descendants of the current node. Such use of xsl:apply-templates cannot result in non-terminating processing loops. However, when xsl:apply-templates is used to process elements that are not descendants of the current node, the possibility arises of non-terminating loops. For example,

    ]]>

    Implementations may be able to detect such loops in some cases, but the possibility exists that a stylesheet may enter a non-terminating loop that an implementation is unable to detect. This may present a denial of service security risk.

    Conflict Resolution for Template Rules

    It is possible for a source node to match more than one template rule. The template rule to be used is determined as follows:

    First, all matching template rules that have lower import precedence than the matching template rule or rules with the highest import precedence are eliminated from consideration.

    Next, all matching template rules that have lower priority than the matching template rule or rules with the highest priority are eliminated from consideration. The priority of a template rule is specified by the priority attribute on the template rule. The value of this must be a real number (positive or negative), matching the production Number with an optional leading minus sign (-). The default priority is computed as follows:

    If the pattern contains multiple alternatives separated by |, then it is treated equivalently to a set of template rules, one for each alternative.

    If the pattern has the form of a QName preceded by a ChildOrAttributeAxisSpecifier or has the form processing-instruction(Literal) preceded by a ChildOrAttributeAxisSpecifier, then the priority is 0.

    If the pattern has the form NCName:* preceded by a ChildOrAttributeAxisSpecifier, then the priority is -0.25.

    Otherwise, if the pattern consists of just a NodeTest preceded by a ChildOrAttributeAxisSpecifier, then the priority is -0.5.

    Otherwise, the priority is 0.5.

    Thus, the most common kind of pattern (a pattern that tests for a node with a particular type and a particular expanded-name) has priority 0. The next less specific kind of pattern (a pattern that tests for a node with a particular type and an expanded-name with a particular namespace URI) has priority -0.25. Patterns less specific than this (patterns that just tests for nodes with particular types) have priority -0.5. Patterns more specific than the most common kind of pattern have priority 0.5.

    It is an error if this leaves more than one matching template rule. An XSLT processor may signal the error; if it does not signal the error, it must recover by choosing, from amongst the matching template rules that are left, the one that occurs last in the stylesheet.

    Overriding Template Rules

    A template rule that is being used to override a template rule in an imported stylesheet (see ) can use the xsl:apply-imports element to invoke the overridden template rule.

    At any point in the processing of a stylesheet, there is a current template rule. Whenever a template rule is chosen by matching a pattern, the template rule becomes the current template rule for the instantiation of the rule's template. When an xsl:for-each element is instantiated, the current template rule becomes null for the instantiation of the content of the xsl:for-each element.

    xsl:apply-imports processes the current node using only template rules that were imported into the stylesheet element containing the current template rule; the node is processed in the current template rule's mode. It is an error if xsl:apply-imports is instantiated when the current template rule is null.

    For example, suppose the stylesheet doc.xsl contains a template rule for example elements:

    ]]>

    Another stylesheet could import doc.xsl and modify the treatment of example elements as follows:

    ]]>

    The combined effect would be to transform an example into an element of the form:

    ...
    ]]>
    Modes

    Modes allow an element to be processed multiple times, each time producing a different result.

    Both xsl:template and xsl:apply-templates have an optional mode attribute. The value of the mode attribute is a QName, which is expanded as described in . If xsl:template does not have a match attribute, it must not have a mode attribute. If an xsl:apply-templates element has a mode attribute, then it applies only to those template rules from xsl:template elements that have a mode attribute with the same value; if an xsl:apply-templates element does not have a mode attribute, then it applies only to those template rules from xsl:template elements that do not have a mode attribute.

    Built-in Template Rules

    There is a built-in template rule to allow recursive processing to continue in the absence of a successful pattern match by an explicit template rule in the stylesheet. This template rule applies to both element nodes and the root node. The following shows the equivalent of the built-in template rule:

    ]]>

    There is also a built-in template rule for each mode, which allows recursive processing to continue in the same mode in the absence of a successful pattern match by an explicit template rule in the stylesheet. This template rule applies to both element nodes and the root node. The following shows the equivalent of the built-in template rule for mode m.

    <xsl:template match="*|/" mode="m"> <xsl:apply-templates mode="m"/> </xsl:template>

    There is also a built-in template rule for text and attribute nodes that copies text through:

    ]]>

    The built-in template rule for processing instructions and comments is to do nothing.

    ]]>

    The built-in template rule for namespace nodes is also to do nothing. There is no pattern that can match a namespace node; so, the built-in template rule is the only template rule that is applied for namespace nodes.

    The built-in template rules are treated as if they were imported implicitly before the stylesheet and so have lower import precedence than all other template rules. Thus, the author can override a built-in template rule by including an explicit template rule.

    Named Templates

    Templates can be invoked by name. An xsl:template element with a name attribute specifies a named template. The value of the name attribute is a QName, which is expanded as described in . If an xsl:template element has a name attribute, it may, but need not, also have a match attribute. An xsl:call-template element invokes a template by name; it has a required name attribute that identifies the template to be invoked. Unlike xsl:apply-templates, xsl:call-template does not change the current node or the current node list.

    The match, mode and priority attributes on an xsl:template element do not affect whether the template is invoked by an xsl:call-template element. Similarly, the name attribute on an xsl:template element does not affect whether the template is invoked by an xsl:apply-templates element.

    It is an error if a stylesheet contains more than one template with the same name and same import precedence.

    Creating the Result Tree

    This section describes instructions that directly create nodes in the result tree.

    Creating Elements and Attributes Literal Result Elements

    In a template, an element in the stylesheet that does not belong to the XSLT namespace and that is not an extension element (see ) is instantiated to create an element node with the same expanded-name. The content of the element is a template, which is instantiated to give the content of the created element node. The created element node will have the attribute nodes that were present on the element node in the stylesheet tree, other than attributes with names in the XSLT namespace.

    The created element node will also have a copy of the namespace nodes that were present on the element node in the stylesheet tree with the exception of any namespace node whose string-value is the XSLT namespace URI (&XSLT.ns;), a namespace URI declared as an extension namespace (see ), or a namespace URI designated as an excluded namespace. A namespace URI is designated as an excluded namespace by using an exclude-result-prefixes attribute on an xsl:stylesheet element or an xsl:exclude-result-prefixes attribute on a literal result element. The value of both these attributes is a whitespace-separated list of namespace prefixes. The namespace bound to each of the prefixes is designated as an excluded namespace. It is an error if there is no namespace bound to the prefix on the element bearing the exclude-result-prefixes or xsl:exclude-result-prefixes attribute. The default namespace (as declared by xmlns) may be designated as an excluded namespace by including #default in the list of namespace prefixes. The designation of a namespace as an excluded namespace is effective within the subtree of the stylesheet rooted at the element bearing the exclude-result-prefixes or xsl:exclude-result-prefixes attribute; a subtree rooted at an xsl:stylesheet element does not include any stylesheets imported or included by children of that xsl:stylesheet element.

    When a stylesheet uses a namespace declaration only for the purposes of addressing the source tree, specifying the prefix in the exclude-result-prefixes attribute will avoid superfluous namespace declarations in the result tree.

    The value of an attribute of a literal result element is interpreted as an attribute value template: it can contain expressions contained in curly braces ({}).

    A namespace URI in the stylesheet tree that is being used to specify a namespace URI in the result tree is called a literal namespace URI. This applies to:

    the namespace URI in the expanded-name of a literal result element in the stylesheet

    the namespace URI in the expanded-name of an attribute specified on a literal result element in the stylesheet

    the string-value of a namespace node on a literal result element in the stylesheet

    A stylesheet can use the xsl:namespace-alias element to declare that one namespace URI is an alias for another namespace URI. When a literal namespace URI has been declared to be an alias for another namespace URI, then the namespace URI in the result tree will be the namespace URI that the literal namespace URI is an alias for, instead of the literal namespace URI itself. The xsl:namespace-alias element declares that the namespace URI bound to the prefix specified by the stylesheet-prefix attribute is an alias for the namespace URI bound to the prefix specified by the result-prefix attribute. Thus, the stylesheet-prefix attribute specifies the namespace URI that will appear in the stylesheet, and the result-prefix attribute specifies the corresponding namespace URI that will appear in the result tree. The default namespace (as declared by xmlns) may be specified by using #default instead of a prefix. If a namespace URI is declared to be an alias for multiple different namespace URIs, then the declaration with the highest import precedence is used. It is an error if there is more than one such declaration. An XSLT processor may signal the error; if it does not signal the error, it must recover by choosing, from amongst the declarations with the highest import precedence, the one that occurs last in the stylesheet.

    When literal result elements are being used to create element, attribute, or namespace nodes that use the XSLT namespace URI, the stylesheet must use an alias. For example, the stylesheet

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;" xmlns:fo="&XSLFO.ns;" xmlns:axsl="&XSLTA.ns;"> ]]>

    will generate an XSLT stylesheet from a document of the form:

    p h1 h2 h3 h4 ]]>

    It may be necessary also to use aliases for namespaces other than the XSLT namespace URI. For example, literal result elements belonging to a namespace dealing with digital signatures might cause XSLT stylesheets to be mishandled by general-purpose security software; using an alias for the namespace would avoid the possibility of such mishandling.

    Creating Elements with xsl:element

    The xsl:element element allows an element to be created with a computed name. The expanded-name of the element to be created is specified by a required name attribute and an optional namespace attribute. The content of the xsl:element element is a template for the attributes and children of the created element.

    The name attribute is interpreted as an attribute value template. It is an error if the string that results from instantiating the attribute value template is not a QName. An XSLT processor may signal the error; if it does not signal the error, then it must recover by making the the result of instantiating the xsl:element element be the sequence of nodes created by instantiating the content of the xsl:element element, excluding any initial attribute nodes. If the namespace attribute is not present then the QName is expanded into an expanded-name using the namespace declarations in effect for the xsl:element element, including any default namespace declaration.

    If the namespace attribute is present, then it also is interpreted as an attribute value template. The string that results from instantiating the attribute value template should be a URI reference. It is not an error if the string is not a syntactically legal URI reference. If the string is empty, then the expanded-name of the element has a null namespace URI. Otherwise, the string is used as the namespace URI of the expanded-name of the element to be created. The local part of the QName specified by the name attribute is used as the local part of the expanded-name of the element to be created.

    XSLT processors may make use of the prefix of the QName specified in the name attribute when selecting the prefix used for outputting the created element as XML; however, they are not required to do so.

    Creating Attributes with xsl:attribute

    The xsl:attribute element can be used to add attributes to result elements whether created by literal result elements in the stylesheet or by instructions such as xsl:element. The expanded-name of the attribute to be created is specified by a required name attribute and an optional namespace attribute. Instantiating an xsl:attribute element adds an attribute node to the containing result element node. The content of the xsl:attribute element is a template for the value of the created attribute.

    The name attribute is interpreted as an attribute value template. It is an error if the string that results from instantiating the attribute value template is not a QName or is the string xmlns. An XSLT processor may signal the error; if it does not signal the error, it must recover by not adding the attribute to the result tree. If the namespace attribute is not present, then the QName is expanded into an expanded-name using the namespace declarations in effect for the xsl:attribute element, not including any default namespace declaration.

    If the namespace attribute is present, then it also is interpreted as an attribute value template. The string that results from instantiating it should be a URI reference. It is not an error if the string is not a syntactically legal URI reference. If the string is empty, then the expanded-name of the attribute has a null namespace URI. Otherwise, the string is used as the namespace URI of the expanded-name of the attribute to be created. The local part of the QName specified by the name attribute is used as the local part of the expanded-name of the attribute to be created.

    XSLT processors may make use of the prefix of the QName specified in the name attribute when selecting the prefix used for outputting the created attribute as XML; however, they are not required to do so and, if the prefix is xmlns, they must not do so. Thus, although it is not an error to do:

    <xsl:attribute name="xmlns:xsl" namespace="whatever">&XSLT.ns;</xsl:attribute>

    it will not result in a namespace declaration being output.

    Adding an attribute to an element replaces any existing attribute of that element with the same expanded-name.

    The following are all errors:

    Adding an attribute to an element after children have been added to it; implementations may either signal the error or ignore the attribute.

    Adding an attribute to a node that is not an element; implementations may either signal the error or ignore the attribute.

    Creating nodes other than text nodes during the instantiation of the content of the xsl:attribute element; implementations may either signal the error or ignore the offending nodes.

    When an xsl:attribute contains a text node with a newline, then the XML output must contain a character reference. For example,

    x y]]>

    will result in the output

    (or with any equivalent character reference). The XML output cannot be

    This is because XML 1.0 requires newline characters in attribute values to be normalized into spaces but requires character references to newline characters not to be normalized. The attribute values in the data model represent the attribute value after normalization. If a newline occurring in an attribute value in the tree were output as a newline character rather than as character reference, then the attribute value in the tree created by reparsing the XML would contain a space not a newline, which would mean that the tree had not been output correctly.

    Named Attribute Sets

    The xsl:attribute-set element defines a named set of attributes. The name attribute specifies the name of the attribute set. The value of the name attribute is a QName, which is expanded as described in . The content of the xsl:attribute-set element consists of zero or more xsl:attribute elements that specify the attributes in the set.

    Attribute sets are used by specifying a use-attribute-sets attribute on xsl:element, xsl:copy (see ) or xsl:attribute-set elements. The value of the use-attribute-sets attribute is a whitespace-separated list of names of attribute sets. Each name is specified as a QName, which is expanded as described in . Specifying a use-attribute-sets attribute is equivalent to adding xsl:attribute elements for each of the attributes in each of the named attribute sets to the beginning of the content of the element with the use-attribute-sets attribute, in the same order in which the names of the attribute sets are specified in the use-attribute-sets attribute. It is an error if use of use-attribute-sets attributes on xsl:attribute-set elements causes an attribute set to directly or indirectly use itself.

    Attribute sets can also be used by specifying an xsl:use-attribute-sets attribute on a literal result element. The value of the xsl:use-attribute-sets attribute is a whitespace-separated list of names of attribute sets. The xsl:use-attribute-sets attribute has the same effect as the use-attribute-sets attribute on xsl:element with the additional rule that attributes specified on the literal result element itself are treated as if they were specified by xsl:attribute elements before any actual xsl:attribute elements but after any xsl:attribute elements implied by the xsl:use-attribute-sets attribute. Thus, for a literal result element, attributes from attribute sets named in an xsl:use-attribute-sets attribute will be added first, in the order listed in the attribute; next, attributes specified on the literal result element will be added; finally, any attributes specified by xsl:attribute elements will be added. Since adding an attribute to an element replaces any existing attribute of that element with the same name, this means that attributes specified in attribute sets can be overridden by attributes specified on the literal result element itself.

    The template within each xsl:attribute element in an xsl:attribute-set element is instantiated each time the attribute set is used; it is instantiated using the same current node and current node list as is used for instantiating the element bearing the use-attribute-sets or xsl:use-attribute-sets attribute. However, it is the position in the stylesheet of the xsl:attribute element rather than of the element bearing the use-attribute-sets or xsl:use-attribute-sets attribute that determines which variable bindings are visible (see ); thus, only variables and parameters declared by top-level xsl:variable and xsl:param elements are visible.

    The following example creates a named attribute set title-style and uses it in a template rule.

    12pt bold ]]>

    Multiple definitions of an attribute set with the same expanded-name are merged. An attribute from a definition that has higher import precedence takes precedence over an attribute from a definition that has lower import precedence. It is an error if there are two attribute sets that have the same expanded-name and equal import precedence and that both contain the same attribute, unless there is a definition of the attribute set with higher import precedence that also contains the attribute. An XSLT processor may signal the error; if it does not signal the error, it must recover by choosing from amongst the definitions that specify the attribute that have the highest import precedence the one that was specified last in the stylesheet. Where the attributes in an attribute set were specified is relevant only in merging the attributes into the attribute set; it makes no difference when the attribute set is used.

    Creating Text

    A template can also contain text nodes. Each text node in a template remaining after whitespace has been stripped as specified in will create a text node with the same string-value in the result tree. Adjacent text nodes in the result tree are automatically merged.

    Note that text is processed at the tree level. Thus, markup of &lt; in a template will be represented in the stylesheet tree by a text node that includes the character <. This will create a text node in the result tree that contains a < character, which will be represented by the markup &lt; (or an equivalent character reference) when the result tree is externalized as an XML document (unless output escaping is disabled as described in ).

    Literal data characters may also be wrapped in an xsl:text element. This wrapping may change what whitespace characters are stripped (see ) but does not affect how the characters are handled by the XSLT processor thereafter.

    The xml:lang and xml:space attributes are not treated specially by XSLT. In particular,

    it is the responsibility of the stylesheet author explicitly to generate any xml:lang or xml:space attributes that are needed in the result;

    specifying an xml:lang or xml:space attribute on an element in the XSLT namespace will not cause any xml:lang or xml:space attributes to appear in the result.

    Creating Processing Instructions

    The xsl:processing-instruction element is instantiated to create a processing instruction node. The content of the xsl:processing-instruction element is a template for the string-value of the processing instruction node. The xsl:processing-instruction element has a required name attribute that specifies the name of the processing instruction node. The value of the name attribute is interpreted as an attribute value template.

    For example, this

    href="book.css" type="text/css"]]>

    would create the processing instruction

    ]]>

    It is an error if the string that results from instantiating the name attribute is not both an NCName and a PITarget. An XSLT processor may signal the error; if it does not signal the error, it must recover by not adding the processing instruction to the result tree.

    This means that xsl:processing-instruction cannot be used to output an XML declaration. The xsl:output element should be used instead (see ).

    It is an error if instantiating the content of xsl:processing-instruction creates nodes other than text nodes. An XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the offending nodes together with their content.

    It is an error if the result of instantiating the content of the xsl:processing-instruction contains the string ?>. An XSLT processor may signal the error; if it does not signal the error, it must recover by inserting a space after any occurrence of ? that is followed by a >.

    Creating Comments

    The xsl:comment element is instantiated to create a comment node in the result tree. The content of the xsl:comment element is a template for the string-value of the comment node.

    For example, this

    This file is automatically generated. Do not edit!]]>

    would create the comment

    ]]>

    It is an error if instantiating the content of xsl:comment creates nodes other than text nodes. An XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the offending nodes together with their content.

    It is an error if the result of instantiating the content of the xsl:comment contains the string -- or ends with -. An XSLT processor may signal the error; if it does not signal the error, it must recover by inserting a space after any occurrence of - that is followed by another - or that ends the comment.

    Copying

    The xsl:copy element provides an easy way of copying the current node. Instantiating the xsl:copy element creates a copy of the current node. The namespace nodes of the current node are automatically copied as well, but the attributes and children of the node are not automatically copied. The content of the xsl:copy element is a template for the attributes and children of the created node; the content is instantiated only for nodes of types that can have attributes or children (i.e. root nodes and element nodes).

    The xsl:copy element may have a use-attribute-sets attribute (see ). This is used only when copying element nodes.

    The root node is treated specially because the root node of the result tree is created implicitly. When the current node is the root node, xsl:copy will not create a root node, but will just use the content template.

    For example, the identity transformation can be written using xsl:copy as follows:

    ]]>

    When the current node is an attribute, then if it would be an error to use xsl:attribute to create an attribute with the same name as the current node, then it is also an error to use xsl:copy (see ).

    The following example shows how xml:lang attributes can be easily copied through from source to result. If a stylesheet defines the following named template:

    ]]>

    then it can simply do

    ]]>

    instead of

    ]]>

    when it wants to copy the xml:lang attribute.

    Computing Generated Text

    Within a template, the xsl:value-of element can be used to compute generated text, for example by extracting text from the source tree or by inserting the value of a variable. The xsl:value-of element does this with an expression that is specified as the value of the select attribute. Expressions can also be used inside attribute values of literal result elements by enclosing the expression in curly braces ({}).

    Generating Text with xsl:value-of

    The xsl:value-of element is instantiated to create a text node in the result tree. The required select attribute is an expression; this expression is evaluated and the resulting object is converted to a string as if by a call to the string function. The string specifies the string-value of the created text node. If the string is empty, no text node will be created. The created text node will be merged with any adjacent text nodes.

    The xsl:copy-of element can be used to copy a node-set over to the result tree without converting it to a string. See .

    For example, the following creates an HTML paragraph from a person element with given-name and family-name attributes. The paragraph will contain the value of the given-name attribute of the current node followed by a space and the value of the family-name attribute of the current node.

    ]]>

    For another example, the following creates an HTML paragraph from a person element with given-name and family-name children elements. The paragraph will contain the string-value of the first given-name child element of the current node followed by a space and the string-value of the first family-name child element of the current node.

    ]]>

    The following precedes each procedure element with a paragraph containing the security level of the procedure. It assumes that the security level that applies to a procedure is determined by a security attribute on the procedure element or on an ancestor element of the procedure. It also assumes that if more than one such element has a security attribute then the security level is determined by the element that is closest to the procedure.

    ]]>
    Attribute Value Templates

    In an attribute value that is interpreted as an attribute value template, such as an attribute of a literal result element, an expression can be used by surrounding the expression with curly braces ({}). The attribute value template is instantiated by replacing the expression together with surrounding curly braces by the result of evaluating the expression and converting the resulting object to a string as if by a call to the string function. Curly braces are not recognized in an attribute value in an XSLT stylesheet unless the attribute is specifically stated to be one that is interpreted as an attribute value template; in an element syntax summary, the value of such attributes is surrounded by curly braces.

    Not all attributes are interpreted as attribute value templates. Attributes whose value is an expression or pattern, attributes of top-level elements and attributes that refer to named XSLT objects are not interpreted as attribute value templates. In addition, xmlns attributes are not interpreted as attribute value templates; it would not be conformant with the XML Namespaces Recommendation to do this.

    The following example creates an img result element from a photograph element in the source; the value of the src attribute of the img element is computed from the value of the image-dir variable and the string-value of the href child of the photograph element; the value of the width attribute of the img element is computed from the value of the width attribute of the size child of the photograph element:

    /images ]]>

    With this source

    headquarters.jpg ]]>

    the result would be

    ]]>

    When an attribute value template is instantiated, a double left or right curly brace outside an expression will be replaced by a single curly brace. It is an error if a right curly brace occurs in an attribute value template outside an expression without being followed by a second right curly brace. A right curly brace inside a Literal in an expression is not recognized as terminating the expression.

    Curly braces are not recognized recursively inside expressions. For example:

    ]]>

    is not allowed. Instead, use simply:

    ]]>
    Numbering

    The xsl:number element is used to insert a formatted number into the result tree. The number to be inserted may be specified by an expression. The value attribute contains an expression. The expression is evaluated and the resulting object is converted to a number as if by a call to the number function. The number is rounded to an integer and then converted to a string using the attributes specified in ; in this context, the value of each of these attributes is interpreted as an attribute value template. After conversion, the resulting string is inserted in the result tree. For example, the following example numbers a sorted list:

    ]]>

    If no value attribute is specified, then the xsl:number element inserts a number based on the position of the current node in the source tree. The following attributes control how the current node is to be numbered:

    The level attribute specifies what levels of the source tree should be considered; it has the values single, multiple or any. The default is single.

    The count attribute is a pattern that specifies what nodes should be counted at those levels. If count attribute is not specified, then it defaults to the pattern that matches any node with the same node type as the current node and, if the current node has an expanded-name, with the same expanded-name as the current node.

    The from attribute is a pattern that specifies where counting starts.

    In addition, the attributes specified in are used for number to string conversion, as in the case when the value attribute is specified.

    The xsl:number element first constructs a list of positive integers using the level, count and from attributes:

    When level="single", it goes up to the first node in the ancestor-or-self axis that matches the count pattern, and constructs a list of length one containing one plus the number of preceding siblings of that ancestor that match the count pattern. If there is no such ancestor, it constructs an empty list. If the from attribute is specified, then the only ancestors that are searched are those that are descendants of the nearest ancestor that matches the from pattern. Preceding siblings has the same meaning here as with the preceding-sibling axis.

    When level="multiple", it constructs a list of all ancestors of the current node in document order followed by the element itself; it then selects from the list those nodes that match the count pattern; it then maps each node in the list to one plus the number of preceding siblings of that node that match the count pattern. If the from attribute is specified, then the only ancestors that are searched are those that are descendants of the nearest ancestor that matches the from pattern. Preceding siblings has the same meaning here as with the preceding-sibling axis.

    When level="any", it constructs a list of length one containing the number of nodes that match the count pattern and belong to the set containing the current node and all nodes at any level of the document that are before the current node in document order, excluding any namespace and attribute nodes (in other words the union of the members of the preceding and ancestor-or-self axes). If the from attribute is specified, then only nodes after the first node before the current node that match the from pattern are considered.

    The list of numbers is then converted into a string using the attributes specified in ; in this context, the value of each of these attributes is interpreted as an attribute value template. After conversion, the resulting string is inserted in the result tree.

    The following would number the items in an ordered list:

    . ]]>

    The following two rules would number title elements. This is intended for a document that contains a sequence of chapters followed by a sequence of appendices, where both chapters and appendices contain sections, which in turn contain subsections. Chapters are numbered 1, 2, 3; appendices are numbered A, B, C; sections in chapters are numbered 1.1, 1.2, 1.3; sections in appendices are numbered A.1, A.2, A.3.

    ]]>

    The following example numbers notes sequentially within a chapter:

    ]]>

    The following example would number H4 elements in HTML with a three-part label:

    . . ]]> Number to String Conversion Attributes

    The following attributes are used to control conversion of a list of numbers into a string. The numbers are integers greater than 0. The attributes are all optional.

    The main attribute is format. The default value for the format attribute is 1. The format attribute is split into a sequence of tokens where each token is a maximal sequence of alphanumeric characters or a maximal sequence of non-alphanumeric characters. Alphanumeric means any character that has a Unicode category of Nd, Nl, No, Lu, Ll, Lt, Lm or Lo. The alphanumeric tokens (format tokens) specify the format to be used for each number in the list. If the first token is a non-alphanumeric token, then the constructed string will start with that token; if the last token is non-alphanumeric token, then the constructed string will end with that token. Non-alphanumeric tokens that occur between two format tokens are separator tokens that are used to join numbers in the list. The nth format token will be used to format the nth number in the list. If there are more numbers than format tokens, then the last format token will be used to format remaining numbers. If there are no format tokens, then a format token of 1 is used to format all numbers. The format token specifies the string to be used to represent the number 1. Each number after the first will be separated from the preceding number by the separator token preceding the format token used to format that number, or, if there are no separator tokens, then by . (a period character).

    Format tokens are a superset of the allowed values for the type attribute for the OL element in HTML 4.0 and are interpreted as follows:

    Any token where the last character has a decimal digit value of 1 (as specified in the Unicode character property database), and the Unicode value of preceding characters is one less than the Unicode value of the last character generates a decimal representation of the number where each number is at least as long as the format token. Thus, a format token 1 generates the sequence 1 2 ... 10 11 12 ..., and a format token 01 generates the sequence 01 02 ... 09 10 11 12 ... 99 100 101.

    A format token A generates the sequence A B C ... Z AA AB AC....

    A format token a generates the sequence a b c ... z aa ab ac....

    A format token i generates the sequence i ii iii iv v vi vii viii ix x ....

    A format token I generates the sequence I II III IV V VI VII VIII IX X ....

    Any other format token indicates a numbering sequence that starts with that token. If an implementation does not support a numbering sequence that starts with that token, it must use a format token of 1.

    When numbering with an alphabetic sequence, the lang attribute specifies which language's alphabet is to be used; it has the same range of values as xml:lang ; if no lang value is specified, the language should be determined from the system environment. Implementers should document for which languages they support numbering.

    Implementers should not make any assumptions about how numbering works in particular languages and should properly research the languages that they wish to support. The numbering conventions of many languages are very different from English.

    The letter-value attribute disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. A value of alphabetic specifies the alphabetic sequence; a value of traditional specifies the other sequence. If the letter-value attribute is not specified, then it is implementation-dependent how any ambiguity is resolved.

    It is possible for two conforming XSLT processors not to convert a number to exactly the same string. Some XSLT processors may not support some languages. Furthermore, there may be variations possible in the way conversions are performed for any particular language that are not specifiable by the attributes on xsl:number. Future versions of XSLT may provide additional attributes to provide control over these variations. Implementations may also use implementation-specific namespaced attributes on xsl:number for this.

    The grouping-separator attribute gives the separator used as a grouping (e.g. thousands) separator in decimal numbering sequences, and the optional grouping-size specifies the size (normally 3) of the grouping. For example, grouping-separator="," and grouping-size="3" would produce numbers of the form 1,000,000. If only one of the grouping-separator and grouping-size attributes is specified, then it is ignored.

    Here are some examples of conversion specifications:

    format="&#x30A2;" specifies Katakana numbering

    format="&#x30A4;" specifies Katakana numbering in the iroha order

    format="&#x0E51;" specifies numbering with Thai digits

    format="&#x05D0;" letter-value="traditional" specifies traditional Hebrew numbering

    format="&#x10D0;" letter-value="traditional" specifies Georgian numbering

    format="&#x03B1;" letter-value="traditional" specifies classical Greek numbering

    format="&#x0430;" letter-value="traditional" specifies Old Slavic numbering

    Repetition

    When the result has a known regular structure, it is useful to be able to specify directly the template for selected nodes. The xsl:for-each instruction contains a template, which is instantiated for each node selected by the expression specified by the select attribute. The select attribute is required. The expression must evaluate to a node-set. The template is instantiated with the selected node as the current node, and with a list of all of the selected nodes as the current node list. The nodes are processed in document order, unless a sorting specification is present (see ).

    For example, given an XML document with this structure

    ... ... ... ... ... ... ]]>

    the following would create an HTML document containing a table with a row for each customer element

    Customers
    ]]>
    Conditional Processing

    There are two instructions in XSLT that support conditional processing in a template: xsl:if and xsl:choose. The xsl:if instruction provides simple if-then conditionality; the xsl:choose instruction supports selection of one choice when there are several possibilities.

    Conditional Processing with xsl:if

    The xsl:if element has a test attribute, which specifies an expression. The content is a template. The expression is evaluated and the resulting object is converted to a boolean as if by a call to the boolean function. If the result is true, then the content template is instantiated; otherwise, nothing is created. In the following example, the names in a group of names are formatted as a comma separated list:

    , ]]>

    The following colors every other table row yellow:

    yellow ]]>
    Conditional Processing with xsl:choose

    The xsl:choose element selects one among a number of possible alternatives. It consists of a sequence of xsl:when elements followed by an optional xsl:otherwise element. Each xsl:when element has a single attribute, test, which specifies an expression. The content of the xsl:when and xsl:otherwise elements is a template. When an xsl:choose element is processed, each of the xsl:when elements is tested in turn, by evaluating the expression and converting the resulting object to a boolean as if by a call to the boolean function. The content of the first, and only the first, xsl:when element whose test is true is instantiated. If no xsl:when is true, the content of the xsl:otherwise element is instantiated. If no xsl:when element is true, and no xsl:otherwise element is present, nothing is created.

    The following example enumerates items in an ordered list using arabic numerals, letters, or roman numerals depending on the depth to which the ordered lists are nested.

    . ]]>
    Sorting

    Sorting is specified by adding xsl:sort elements as children of an xsl:apply-templates or xsl:for-each element. The first xsl:sort child specifies the primary sort key, the second xsl:sort child specifies the secondary sort key and so on. When an xsl:apply-templates or xsl:for-each element has one or more xsl:sort children, then instead of processing the selected nodes in document order, it sorts the nodes according to the specified sort keys and then processes them in sorted order. When used in xsl:for-each, xsl:sort elements must occur first. When a template is instantiated by xsl:apply-templates and xsl:for-each, the current node list list consists of the complete list of nodes being processed in sorted order.

    xsl:sort has a select attribute whose value is an expression. For each node to be processed, the expression is evaluated with that node as the current node and with the complete list of nodes being processed in unsorted order as the current node list. The resulting object is converted to a string as if by a call to the string function; this string is used as the sort key for that node. The default value of the select attribute is ., which will cause the string-value of the current node to be used as the sort key.

    This string serves as a sort key for the node. The following optional attributes on xsl:sort control how the list of sort keys are sorted; the values of all of these attributes are interpreted as attribute value templates.

    order specifies whether the strings should be sorted in ascending or descending order; ascending specifies ascending order; descending specifies descending order; the default is ascending

    lang specifies the language of the sort keys; it has the same range of values as xml:lang ; if no lang value is specified, the language should be determined from the system environment

    data-type specifies the data type of the strings; the following values are allowed:

    text specifies that the sort keys should be sorted lexicographically in the culturally correct manner for the language specified by lang

    number specifies that the sort keys should be converted to numbers and then sorted according to the numeric value; the sort key is converted to a number as if by a call to the number function; the lang attribute is ignored

    a QName with a prefix is expanded into an expanded-name as described in ; the expanded-name identifies the data-type; the behavior in this case is not specified by this document

    The default value is text.

    The XSL Working Group plans that future versions of XSLT will leverage XML Schemas to define further values for this attribute.

    case-order has the value upper-first or lower-first; this applies when data-type="text", and specifies that upper-case letters should sort before lower-case letters or vice-versa respectively. For example, if lang="en", then A a B b are sorted with case-order="upper-first" and a A b B are sorted with case-order="lower-first". The default value is language dependent.

    It is possible for two conforming XSLT processors not to sort exactly the same. Some XSLT processors may not support some languages. Furthermore, there may be variations possible in the sorting of any particular language that are not specified by the attributes on xsl:sort, for example, whether Hiragana or Katakana is sorted first in Japanese. Future versions of XSLT may provide additional attributes to provide control over these variations. Implementations may also use implementation-specific namespaced attributes on xsl:sort for this.

    It is recommended that implementers consult for information on internationalized sorting.

    The sort must be stable: in the sorted list of nodes, any sub list that has sort keys that all compare equal must be in document order.

    For example, suppose an employee database has the form

    James Clark ... ]]>

    Then a list of employees sorted by name could be generated using:

  • ]]>
    Variables and Parameters

    A variable is a name that may be bound to a value. The value to which a variable is bound (the value of the variable) can be an object of any of the types that can be returned by expressions. There are two elements that can be used to bind variables: xsl:variable and xsl:param. The difference is that the value specified on the xsl:param variable is only a default value for the binding; when the template or stylesheet within which the xsl:param element occurs is invoked, parameters may be passed that are used in place of the default values.

    Both xsl:variable and xsl:param have a required name attribute, which specifies the name of the variable. The value of the name attribute is a QName, which is expanded as described in .

    For any use of these variable-binding elements, there is a region of the stylesheet tree within which the binding is visible; within this region, any binding of the variable that was visible on the variable-binding element itself is hidden. Thus, only the innermost binding of a variable is visible. The set of variable bindings in scope for an expression consists of those bindings that are visible at the point in the stylesheet where the expression occurs.

    Result Tree Fragments

    Variables introduce an additional data-type into the expression language. This additional data type is called result tree fragment. A variable may be bound to a result tree fragment instead of one of the four basic XPath data-types (string, number, boolean, node-set). A result tree fragment represents a fragment of the result tree. A result tree fragment is treated equivalently to a node-set that contains just a single root node. However, the operations permitted on a result tree fragment are a subset of those permitted on a node-set. An operation is permitted on a result tree fragment only if that operation would be permitted on a string (the operation on the string may involve first converting the string to a number or boolean). In particular, it is not permitted to use the /, //, and [] operators on result tree fragments. When a permitted operation is performed on a result tree fragment, it is performed exactly as it would be on the equivalent node-set.

    When a result tree fragment is copied into the result tree (see ), then all the nodes that are children of the root node in the equivalent node-set are added in sequence to the result tree.

    Expressions can only return values of type result tree fragment by referencing variables of type result tree fragment or calling extension functions that return a result tree fragment or getting a system property whose value is a result tree fragment.

    Values of Variables and Parameters

    A variable-binding element can specify the value of the variable in three alternative ways.

    If the variable-binding element has a select attribute, then the value of the attribute must be an expression and the value of the variable is the object that results from evaluating the expression. In this case, the content must be empty.

    If the variable-binding element does not have a select attribute and has non-empty content (i.e. the variable-binding element has one or more child nodes), then the content of the variable-binding element specifies the value. The content of the variable-binding element is a template, which is instantiated to give the value of the variable. The value is a result tree fragment equivalent to a node-set containing just a single root node having as children the sequence of nodes produced by instantiating the template. The base URI of the nodes in the result tree fragment is the base URI of the variable-binding element.

    It is an error if a member of the sequence of nodes created by instantiating the template is an attribute node or a namespace node, since a root node cannot have an attribute node or a namespace node as a child. An XSLT processor may signal the error; if it does not signal the error, it must recover by not adding the attribute node or namespace node.

    If the variable-binding element has empty content and does not have a select attribute, then the value of the variable is an empty string. Thus

    ]]>

    is equivalent to

    ]]>

    When a variable is used to select nodes by position, be careful not to do:

    2 ... ]]>

    This will output the value of the first item element, because the variable n will be bound to a result tree fragment, not a number. Instead, do either

    ... ]]>

    or

    2 ... ]]>

    One convenient way to specify the empty node-set as the default value of a parameter is:

    ]]>
    Using Values of Variables and Parameters with xsl:copy-of

    The xsl:copy-of element can be used to insert a result tree fragment into the result tree, without first converting it to a string as xsl:value-of does (see ). The required select attribute contains an expression. When the result of evaluating the expression is a result tree fragment, the complete fragment is copied into the result tree. When the result is a node-set, all the nodes in the set are copied in document order into the result tree; copying an element node copies the attribute nodes, namespace nodes and children of the element node as well as the element node itself; a root node is copied by copying its children. When the result is neither a node-set nor a result tree fragment, the result is converted to a string and then inserted into the result tree, as with xsl:value-of.

    Top-level Variables and Parameters

    Both xsl:variable and xsl:param are allowed as top-level elements. A top-level variable-binding element declares a global variable that is visible everywhere. A top-level xsl:param element declares a parameter to the stylesheet; XSLT does not define the mechanism by which parameters are passed to the stylesheet. It is an error if a stylesheet contains more than one binding of a top-level variable with the same name and same import precedence. At the top-level, the expression or template specifying the variable value is evaluated with the same context as that used to process the root node of the source document: the current node is the root node of the source document and the current node list is a list containing just the root node of the source document. If the template or expression specifying the value of a global variable x references a global variable y, then the value for y must be computed before the value of x. It is an error if it is impossible to do this for all global variable definitions; in other words, it is an error if the definitions are circular.

    This example declares a global variable para-font-size, which it references in an attribute value template.

    12pt ]]>
    Variables and Parameters within Templates

    As well as being allowed at the top-level, both xsl:variable and xsl:param are also allowed in templates. xsl:variable is allowed anywhere within a template that an instruction is allowed. In this case, the binding is visible for all following siblings and their descendants. Note that the binding is not visible for the xsl:variable element itself. xsl:param is allowed as a child at the beginning of an xsl:template element. In this context, the binding is visible for all following siblings and their descendants. Note that the binding is not visible for the xsl:param element itself.

    A binding shadows another binding if the binding occurs at a point where the other binding is visible, and the bindings have the same name. It is an error if a binding established by an xsl:variable or xsl:param element within a template shadows another binding established by an xsl:variable or xsl:param element also within the template. It is not an error if a binding established by an xsl:variable or xsl:param element in a template shadows another binding established by an xsl:variable or xsl:param top-level element. Thus, the following is an error:

    ]]>

    However, the following is allowed:

    ]]>

    The nearest equivalent in Java to an xsl:variable element in a template is a final local variable declaration with an initializer. For example,

    ]]>

    has similar semantics to

    final Object x = "value";

    XSLT does not provide an equivalent to the Java assignment operator

    x = "value";

    because this would make it harder to create an implementation that processes a document other than in a batch-like way, starting at the beginning and continuing through to the end.

    Passing Parameters to Templates

    Parameters are passed to templates using the xsl:with-param element. The required name attribute specifies the name of the parameter (the variable the value of whose binding is to be replaced). The value of the name attribute is a QName, which is expanded as described in . xsl:with-param is allowed within both xsl:call-template and xsl:apply-templates. The value of the parameter is specified in the same way as for xsl:variable and xsl:param. The current node and current node list used for computing the value specified by xsl:with-param element is the same as that used for the xsl:apply-templates or xsl:call-template element within which it occurs. It is not an error to pass a parameter x to a template that does not have an xsl:param element for x; the parameter is simply ignored.

    This example defines a named template for a numbered-block with an argument to control the format of the number.

    1. a. ]]>
    Additional Functions

    This section describes XSLT-specific additions to the core XPath function library. Some of these additional functions also make use of information specified by top-level elements in the stylesheet; this section also describes these elements.

    Multiple Source Documents

    The document function allows access to XML documents other than the main source document.

    When the document function has exactly one argument and the argument is a node-set, then the result is the union, for each node in the argument node-set, of the result of calling the document function with the first argument being the string-value of the node, and the second argument being a node-set with the node as its only member. When the document function has two arguments and the first argument is a node-set, then the result is the union, for each node in the argument node-set, of the result of calling the document function with the first argument being the string-value of the node, and with the second argument being the second argument passed to the document function.

    When the first argument to the document function is not a node-set, the first argument is converted to a string as if by a call to the string function. This string is treated as a URI reference; the resource identified by the URI is retrieved. The data resulting from the retrieval action is parsed as an XML document and a tree is constructed in accordance with the data model (see ). If there is an error retrieving the resource, then the XSLT processor may signal an error; if it does not signal an error, it must recover by returning an empty node-set. One possible kind of retrieval error is that the XSLT processor does not support the URI scheme used by the URI. An XSLT processor is not required to support any particular URI schemes. The documentation for an XSLT processor should specify which URI schemes the XSLT processor supports.

    If the URI reference does not contain a fragment identifier, then a node-set containing just the root node of the document is returned. If the URI reference does contain a fragment identifier, the function returns a node-set containing the nodes in the tree identified by the fragment identifier of the URI reference. The semantics of the fragment identifier is dependent on the media type of the result of retrieving the URI. If there is an error in processing the fragment identifier, the XSLT processor may signal the error; if it does not signal the error, it must recover by returning an empty node-set. Possible errors include:

    The fragment identifier identifies something that cannot be represented by an XSLT node-set (such as a range of characters within a text node).

    The XSLT processor does not support fragment identifiers for the media-type of the retrieval result. An XSLT processor is not required to support any particular media types. The documentation for an XSLT processor should specify for which media types the XSLT processor supports fragment identifiers.

    The data resulting from the retrieval action is parsed as an XML document regardless of the media type of the retrieval result; if the top-level media type is text, then it is parsed in the same way as if the media type were text/xml; otherwise, it is parsed in the same way as if the media type were application/xml.

    Since there is no top-level xml media type, data with a media type other than text/xml or application/xml may in fact be XML.

    The URI reference may be relative. The base URI (see ) of the node in the second argument node-set that is first in document order is used as the base URI for resolving the relative URI into an absolute URI. If the second argument is omitted, then it defaults to the node in the stylesheet that contains the expression that includes the call to the document function. Note that a zero-length URI reference is a reference to the document relative to which the URI reference is being resolved; thus document("") refers to the root node of the stylesheet; the tree representation of the stylesheet is exactly the same as if the XML document containing the stylesheet was the initial source document.

    Two documents are treated as the same document if they are identified by the same URI. The URI used for the comparison is the absolute URI into which any relative URI was resolved and does not include any fragment identifier. One root node is treated as the same node as another root node if the two nodes are from the same document. Thus, the following expression will always be true:

    generate-id(document("foo.xml"))=generate-id(document("foo.xml"))

    The document function gives rise to the possibility that a node-set may contain nodes from more than one document. With such a node-set, the relative document order of two nodes in the same document is the normal document order defined by XPath . The relative document order of two nodes in different documents is determined by an implementation-dependent ordering of the documents containing the two nodes. There are no constraints on how the implementation orders documents other than that it must do so consistently: an implementation must always use the same order for the same set of documents.

    Keys

    Keys provide a way to work with documents that contain an implicit cross-reference structure. The ID, IDREF and IDREFS attribute types in XML provide a mechanism to allow XML documents to make their cross-reference explicit. XSLT supports this through the XPath id function. However, this mechanism has a number of limitations:

    ID attributes must be declared as such in the DTD. If an ID attribute is declared as an ID attribute only in the external DTD subset, then it will be recognized as an ID attribute only if the XML processor reads the external DTD subset. However, XML does not require XML processors to read the external DTD, and they may well choose not to do so, especially if the document is declared standalone="yes".

    A document can contain only a single set of unique IDs. There cannot be separate independent sets of unique IDs.

    The ID of an element can only be specified in an attribute; it cannot be specified by the content of the element, or by a child element.

    An ID is constrained to be an XML name. For example, it cannot contain spaces.

    An element can have at most one ID.

    At most one element can have a particular ID.

    Because of these limitations XML documents sometimes contain a cross-reference structure that is not explicitly declared by ID/IDREF/IDREFS attributes.

    A key is a triple containing:

    the node which has the key

    the name of the key (an expanded-name)

    the value of the key (a string)

    A stylesheet declares a set of keys for each document using the xsl:key element. When this set of keys contains a member with node x, name y and value z, we say that node x has a key with name y and value z.

    Thus, a key is a kind of generalized ID, which is not subject to the same limitations as an XML ID:

    Keys are declared in the stylesheet using xsl:key elements.

    A key has a name as well as a value; each key name may be thought of as distinguishing a separate, independent space of identifiers.

    The value of a named key for an element may be specified in any convenient place; for example, in an attribute, in a child element or in content. An XPath expression is used to specify where to find the value for a particular named key.

    The value of a key can be an arbitrary string; it is not constrained to be a name.

    There can be multiple keys in a document with the same node, same key name, but different key values.

    There can be multiple keys in a document with the same key name, same key value, but different nodes.

    The xsl:key element is used to declare keys. The name attribute specifies the name of the key. The value of the name attribute is a QName, which is expanded as described in . The match attribute is a Pattern; an xsl:key element gives information about the keys of any node that matches the pattern specified in the match attribute. The use attribute is an expression specifying the values of the key; the expression is evaluated once for each node that matches the pattern. If the result is a node-set, then for each node in the node-set, the node that matches the pattern has a key of the specified name whose value is the string-value of the node in the node-set; otherwise, the result is converted to a string, and the node that matches the pattern has a key of the specified name with value equal to that string. Thus, a node x has a key with name y and value z if and only if there is an xsl:key element such that:

    x matches the pattern specified in the match attribute of the xsl:key element;

    the value of the name attribute of the xsl:key element is equal to y; and

    when the expression specified in the use attribute of the xsl:key element is evaluated with x as the current node and with a node list containing just x as the current node list resulting in an object u, then either z is equal to the result of converting u to a string as if by a call to the string function, or u is a node-set and z is equal to the string-value of one or more of the nodes in u.

    Note also that there may be more than one xsl:key element that matches a given node; all of the matching xsl:key elements are used, even if they do not have the same import precedence.

    It is an error for the value of either the use attribute or the match attribute to contain a VariableReference.

    The key function does for keys what the id function does for IDs. The first argument specifies the name of the key. The value of the argument must be a QName, which is expanded as described in . When the second argument to the key function is of type node-set, then the result is the union of the result of applying the key function to the string value of each of the nodes in the argument node-set. When the second argument to key is of any other type, the argument is converted to a string as if by a call to the string function; it returns a node-set containing the nodes in the same document as the context node that have a value for the named key equal to this string.

    For example, given a declaration

    ]]>

    an expression key("idkey",@ref) will return the same node-set as id(@ref), assuming that the only ID attribute declared in the XML source document is:

    ]]>

    and that the ref attribute of the current node contains no whitespace.

    Suppose a document describing a function library uses a prototype element to define functions

    ]]>

    and a function element to refer to function names

    key]]>

    Then the stylesheet could generate hyperlinks between the references and definitions as follows:

    Function: ...

    ]]>

    The key can be used to retrieve a key from a document other than the document containing the context node. For example, suppose a document contains bibliographic references in the form XSLT]]>, and there is a separate XML document bib.xml containing a bibliographic database with entries in the form:

    ...]]>

    Then the stylesheet could use the following to transform the bibref elements:

    ]]>
    Number Formatting

    The format-number function converts its first argument to a string using the format pattern string specified by the second argument and the decimal-format named by the third argument, or the default decimal-format, if there is no third argument. The format pattern string is in the syntax specified by the JDK 1.1 DecimalFormat class. The format pattern string is in a localized notation: the decimal-format determines what characters have a special meaning in the pattern (with the exception of the quote character, which is not localized). The format pattern must not contain the currency sign (#x00A4); support for this feature was added after the initial release of JDK 1.1. The decimal-format name must be a QName, which is expanded as described in . It is an error if the stylesheet does not contain a declaration of the decimal-format with the specified expanded-name.

    Implementations are not required to use the JDK 1.1 implementation, nor are implementations required to be implemented in Java.

    Stylesheets can use other facilities in XPath to control rounding.

    The xsl:decimal-format element declares a decimal-format, which controls the interpretation of a format pattern used by the format-number function. If there is a name attribute, then the element declares a named decimal-format; otherwise, it declares the default decimal-format. The value of the name attribute is a QName, which is expanded as described in . It is an error to declare either the default decimal-format or a decimal-format with a given name more than once (even with different import precedence), unless it is declared every time with the same value for all attributes (taking into account any default values).

    The other attributes on xsl:decimal-format correspond to the methods on the JDK 1.1 DecimalFormatSymbols class. For each get/set method pair there is an attribute defined for the xsl:decimal-format element.

    The following attributes both control the interpretation of characters in the format pattern and specify characters that may appear in the result of formatting the number:

    decimal-separator specifies the character used for the decimal sign; the default value is the period character (.)

    grouping-separator specifies the character used as a grouping (e.g. thousands) separator; the default value is the comma character (,)

    percent specifies the character used as a percent sign; the default value is the percent character (%)

    per-mille specifies the character used as a per mille sign; the default value is the Unicode per-mille character (#x2030)

    zero-digit specifies the character used as the digit zero; the default value is the digit zero (0)

    The following attributes control the interpretation of characters in the format pattern:

    digit specifies the character used for a digit in the format pattern; the default value is the number sign character (#)

    pattern-separator specifies the character used to separate positive and negative sub patterns in a pattern; the default value is the semi-colon character (;)

    The following attributes specify characters or strings that may appear in the result of formatting the number:

    infinity specifies the string used to represent infinity; the default value is the string Infinity

    NaN specifies the string used to represent the NaN value; the default value is the string NaN

    minus-sign specifies the character used as the default minus sign; the default value is the hyphen-minus character (-, #x2D)

    Miscellaneous Additional Functions

    The current function returns a node-set that has the current node as its only member. For an outermost expression (an expression not occurring within another expression), the current node is always the same as the context node. Thus,

    ]]>

    means the same as

    ]]>

    However, within square brackets the current node is usually different from the context node. For example,

    ]]>

    will process all item elements that have a glossary parent element and that have a name attribute with value equal to the value of the current node's ref attribute. This is different from

    ]]>

    which means the same as

    ]]>

    and so would process all item elements that have a glossary parent element and that have a name attribute and a ref attribute with the same value.

    It is an error to use the current function in a pattern.

    The unparsed-entity-uri returns the URI of the unparsed entity with the specified name in the same document as the context node (see ). It returns the empty string if there is no such entity.

    The generate-id function returns a string that uniquely identifies the node in the argument node-set that is first in document order. The unique identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name. An implementation is free to generate an identifier in any convenient way provided that it always generates the same identifier for the same node and that different identifiers are always generated from different nodes. An implementation is under no obligation to generate the same identifiers each time a document is transformed. There is no guarantee that a generated unique identifier will be distinct from any unique IDs specified in the source document. If the argument node-set is empty, the empty string is returned. If the argument is omitted, it defaults to the context node.

    The argument must evaluate to a string that is a QName. The QName is expanded into a name using the namespace declarations in scope for the expression. The system-property function returns an object representing the value of the system property identified by the name. If there is no such system property, the empty string should be returned.

    Implementations must provide the following system properties, which are all in the XSLT namespace:

    xsl:version, a number giving the version of XSLT implemented by the processor; for XSLT processors implementing the version of XSLT specified by this document, this is the number 1.0 xsl:vendor, a string identifying the vendor of the XSLT processor xsl:vendor-url, a string containing a URL identifying the vendor of the XSLT processor; typically this is the host page (home page) of the vendor's Web site.
    Messages

    The xsl:message instruction sends a message in a way that is dependent on the XSLT processor. The content of the xsl:message instruction is a template. The xsl:message is instantiated by instantiating the content to create an XML fragment. This XML fragment is the content of the message.

    An XSLT processor might implement xsl:message by popping up an alert box or by writing to a log file.

    If the terminate attribute has the value yes, then the XSLT processor should terminate processing after sending the message. The default value is no.

    One convenient way to do localization is to put the localized information (message text, etc.) in an XML document, which becomes an additional input file to the stylesheet. For example, suppose messages for a language L are stored in an XML file resources/L.xml in the form:

    A problem was detected. An error was detected. ]]>

    Then a stylesheet could use the following approach to localize messages:

    problem ]]>
    Extensions

    XSLT allows two kinds of extension, extension elements and extension functions.

    This version of XSLT does not provide a mechanism for defining implementations of extensions. Therefore, an XSLT stylesheet that must be portable between XSLT implementations cannot rely on particular extensions being available. XSLT provides mechanisms that allow an XSLT stylesheet to determine whether the XSLT processor by which it is being processed has implementations of particular extensions available, and to specify what should happen if those extensions are not available. If an XSLT stylesheet is careful to make use of these mechanisms, it is possible for it to take advantage of extensions and still work with any XSLT implementation.

    Extension Elements

    The element extension mechanism allows namespaces to be designated as extension namespaces. When a namespace is designated as an extension namespace and an element with a name from that namespace occurs in a template, then the element is treated as an instruction rather than as a literal result element. The namespace determines the semantics of the instruction.

    Since an element that is a child of an xsl:stylesheet element is not occurring in a template, non-XSLT top-level elements are not extension elements as defined here, and nothing in this section applies to them.

    A namespace is designated as an extension namespace by using an extension-element-prefixes attribute on an xsl:stylesheet element or an xsl:extension-element-prefixes attribute on a literal result element or extension element. The value of both these attributes is a whitespace-separated list of namespace prefixes. The namespace bound to each of the prefixes is designated as an extension namespace. It is an error if there is no namespace bound to the prefix on the element bearing the extension-element-prefixes or xsl:extension-element-prefixes attribute. The default namespace (as declared by xmlns) may be designated as an extension namespace by including #default in the list of namespace prefixes. The designation of a namespace as an extension namespace is effective within the subtree of the stylesheet rooted at the element bearing the extension-element-prefixes or xsl:extension-element-prefixes attribute; a subtree rooted at an xsl:stylesheet element does not include any stylesheets imported or included by children of that xsl:stylesheet element.

    If the XSLT processor does not have an implementation of a particular extension element available, then the element-available function must return false for the name of the element. When such an extension element is instantiated, then the XSLT processor must perform fallback for the element as specified in . An XSLT processor must not signal an error merely because a template contains an extension element for which no implementation is available.

    If the XSLT processor has an implementation of a particular extension element available, then the element-available function must return true for the name of the element.

    Extension Functions

    If a FunctionName in a FunctionCall expression is not an NCName (i.e. if it contains a colon), then it is treated as a call to an extension function. The FunctionName is expanded to a name using the namespace declarations from the evaluation context.

    If the XSLT processor does not have an implementation of an extension function of a particular name available, then the function-available function must return false for that name. If such an extension function occurs in an expression and the extension function is actually called, the XSLT processor must signal an error. An XSLT processor must not signal an error merely because an expression contains an extension function for which no implementation is available.

    If the XSLT processor has an implementation of an extension function of a particular name available, then the function-available function must return true for that name. If such an extension is called, then the XSLT processor must call the implementation passing it the function call arguments; the result returned by the implementation is returned as the result of the function call.

    Fallback

    Normally, instantiating an xsl:fallback element does nothing. However, when an XSLT processor performs fallback for an instruction element, if the instruction element has one or more xsl:fallback children, then the content of each of the xsl:fallback children must be instantiated in sequence; otherwise, an error must be signaled. The content of an xsl:fallback element is a template.

    The following functions can be used with the xsl:choose and xsl:if instructions to explicitly control how a stylesheet should behave if particular elements or functions are not available.

    The argument must evaluate to a string that is a QName. The QName is expanded into an expanded-name using the namespace declarations in scope for the expression. The element-available function returns true if and only if the expanded-name is the name of an instruction. If the expanded-name has a namespace URI equal to the XSLT namespace URI, then it refers to an element defined by XSLT. Otherwise, it refers to an extension element. If the expanded-name has a null namespace URI, the element-available function will return false.

    The argument must evaluate to a string that is a QName. The QName is expanded into an expanded-name using the namespace declarations in scope for the expression. The function-available function returns true if and only if the expanded-name is the name of a function in the function library. If the expanded-name has a non-null namespace URI, then it refers to an extension function; otherwise, it refers to a function defined by XPath or XSLT.

    Output

    An XSLT processor may output the result tree as a sequence of bytes, although it is not required to be able to do so (see ). The xsl:output element allows stylesheet authors to specify how they wish the result tree to be output. If an XSLT processor outputs the result tree, it should do so as specified by the xsl:output element; however, it is not required to do so.

    The xsl:output element is only allowed as a top-level element.

    The method attribute on xsl:output identifies the overall method that should be used for outputting the result tree. The value must be a QName. If the QName does not have a prefix, then it identifies a method specified in this document and must be one of xml, html or text. If the QName has a prefix, then the QName is expanded into an expanded-name as described in ; the expanded-name identifies the output method; the behavior in this case is not specified by this document.

    The default for the method attribute is chosen as follows. If

    the root node of the result tree has an element child,

    the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and

    any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,

    then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute.

    The other attributes on xsl:output provide parameters for the output method. The following attributes are allowed:

    version specifies the version of the output method

    indent specifies whether the XSLT processor may add additional whitespace when outputting the result tree; the value must be yes or no

    encoding specifies the preferred character encoding that the XSLT processor should use to encode sequences of characters as sequences of bytes; the value of the attribute should be treated case-insensitively; the value must contain only characters in the range #x21 to #x7E (i.e. printable ASCII characters); the value should either be a charset registered with the Internet Assigned Numbers Authority , or start with X-

    media-type specifies the media type (MIME content type) of the data that results from outputting the result tree; the charset parameter should not be specified explicitly; instead, when the top-level media type is text, a charset parameter should be added according to the character encoding actually used by the output method

    doctype-system specifies the system identifier to be used in the document type declaration

    doctype-public specifies the public identifier to be used in the document type declaration

    omit-xml-declaration specifies whether the XSLT processor should output an XML declaration; the value must be yes or no

    standalone specifies whether the XSLT processor should output a standalone document declaration; the value must be yes or no

    cdata-section-elements specifies a list of the names of elements whose text node children should be output using CDATA sections

    The detailed semantics of each attribute will be described separately for each output method for which it is applicable. If the semantics of an attribute are not described for an output method, then it is not applicable to that output method.

    A stylesheet may contain multiple xsl:output elements and may include or import stylesheets that also contain xsl:output elements. All the xsl:output elements occurring in a stylesheet are merged into a single effective xsl:output element. For the cdata-section-elements attribute, the effective value is the union of the specified values. For other attributes, the effective value is the specified value with the highest import precedence. It is an error if there is more than one such value for an attribute. An XSLT processor may signal the error; if it does not signal the error, if should recover by using the value that occurs last in the stylesheet. The values of attributes are defaulted after the xsl:output elements have been merged; different output methods may have different default values for an attribute.

    XML Output Method

    The xml output method outputs the result tree as a well-formed XML external general parsed entity. If the root node of the result tree has a single element node child and no text node children, then the entity should also be a well-formed XML document entity. When the entity is referenced within a trivial XML document wrapper like this

    entity-URI ]> &e;]]>

    where entity-URI is a URI for the entity, then the wrapper document as a whole should be a well-formed XML document conforming to the XML Namespaces Recommendation . In addition, the output should be such that if a new tree was constructed by parsing the wrapper as an XML document as specified in , and then removing the document element, making its children instead be children of the root node, then the new tree would be the same as the result tree, with the following possible exceptions:

    The order of attributes in the two trees may be different.

    The new tree may contain namespace nodes that were not present in the result tree.

    An XSLT processor may need to add namespace declarations in the course of outputting the result tree as XML.

    If the XSLT processor generated a document type declaration because of the doctype-system attribute, then the above requirements apply to the entity with the generated document type declaration removed.

    The version attribute specifies the version of XML to be used for outputting the result tree. If the XSLT processor does not support this version of XML, it should use a version of XML that it does support. The version output in the XML declaration (if an XML declaration is output) should correspond to the version of XML that the processor used for outputting the result tree. The value of the version attribute should match the VersionNum production of the XML Recommendation . The default value is 1.0.

    The encoding attribute specifies the preferred encoding to use for outputting the result tree. XSLT processors are required to respect values of UTF-8 and UTF-16. For other values, if the XSLT processor does not support the specified encoding it may signal an error; if it does not signal an error it should use UTF-8 or UTF-16 instead. The XSLT processor must not use an encoding whose name does not match the EncName production of the XML Recommendation . If no encoding attribute is specified, then the XSLT processor should use either UTF-8 or UTF-16. It is possible that the result tree will contain a character that cannot be represented in the encoding that the XSLT processor is using for output. In this case, if the character occurs in a context where XML recognizes character references (i.e. in the value of an attribute node or text node), then the character should be output as a character reference; otherwise (for example if the character occurs in the name of an element) the XSLT processor should signal an error.

    If the indent attribute has the value yes, then the xml output method may output whitespace in addition to the whitespace in the result tree (possibly based on whitespace stripped from either the source document or the stylesheet) in order to indent the result nicely; if the indent attribute has the value no, it should not output any additional whitespace. The default value is no. The xml output method should use an algorithm to output additional whitespace that ensures that the result if whitespace were to be stripped from the output using the process described in with the set of whitespace-preserving elements consisting of just xsl:text would be the same when additional whitespace is output as when additional whitespace is not output.

    It is usually not safe to use indent="yes" with document types that include element types with mixed content.

    The cdata-section-elements attribute contains a whitespace-separated list of QNames. Each QName is expanded into an expanded-name using the namespace declarations in effect on the xsl:output element in which the QName occurs; if there is a default namespace, it is used for QNames that do not have a prefix. The expansion is performed before the merging of multiple xsl:output elements into a single effective xsl:output element. If the expanded-name of the parent of a text node is a member of the list, then the text node should be output as a CDATA section. For example,

    ]]>

    would cause a literal result element written in the stylesheet as

    <foo>]]>

    or as

    <example><![CDATA[<foo>]]></example>

    to be output as

    <example><![CDATA[<foo>]]></example>

    If the text node contains the sequence of characters ]]>, then the currently open CDATA section should be closed following the ]] and a new CDATA section opened before the >. For example, a literal result element written in the stylesheet as

    <example>]]&gt;</example>

    would be output as

    <example><![CDATA[]]]]><![CDATA[>]]></example>

    If the text node contains a character that is not representable in the character encoding being used to output the result tree, then the currently open CDATA section should be closed before the character, the character should be output using a character reference or entity reference, and a new CDATA section should be opened for any further characters in the text node.

    CDATA sections should not be used except for text nodes that the cdata-section-elements attribute explicitly specifies should be output using CDATA sections.

    The xml output method should output an XML declaration unless the omit-xml-declaration attribute has the value yes. The XML declaration should include both version information and an encoding declaration. If the standalone attribute is specified, it should include a standalone document declaration with the same value as the value as the value of the standalone attribute. Otherwise, it should not include a standalone document declaration; this ensures that it is both a XML declaration (allowed at the beginning of a document entity) and a text declaration (allowed at the beginning of an external general parsed entity).

    If the doctype-system attribute is specified, the xml output method should output a document type declaration immediately before the first element. The name following <!DOCTYPE should be the name of the first element. If doctype-public attribute is also specified, then the xml output method should output PUBLIC followed by the public identifier and then the system identifier; otherwise, it should output SYSTEM followed by the system identifier. The internal subset should be empty. The doctype-public attribute should be ignored unless the doctype-system attribute is specified.

    The media-type attribute is applicable for the xml output method. The default value for the media-type attribute is text/xml.

    HTML Output Method

    The html output method outputs the result tree as HTML; for example,

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;"> ... ]]>

    The version attribute indicates the version of the HTML. The default value is 4.0, which specifies that the result should be output as HTML conforming to the HTML 4.0 Recommendation .

    The html output method should not output an element differently from the xml output method unless the expanded-name of the element has a null namespace URI; an element whose expanded-name has a non-null namespace URI should be output as XML. If the expanded-name of the element has a null namespace URI, but the local part of the expanded-name is not recognized as the name of an HTML element, the element should output in the same way as a non-empty, inline element such as span.

    The html output method should not output an end-tag for empty elements. For HTML 4.0, the empty elements are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in the stylesheet should be output as <br>.

    The html output method should recognize the names of HTML elements regardless of case. For example, elements named br, BR or Br should all be recognized as the HTML br element and output without an end-tag.

    The html output method should not perform escaping for the content of the script and style elements. For example, a literal result element written in the stylesheet as

    if (a < b) foo()]]>

    or

    ]]>

    should be output as

    if (a < b) foo()]]>

    The html output method should not escape < characters occurring in attribute values.

    If the indent attribute has the value yes, then the html output method may add or remove whitespace as it outputs the result tree, so long as it does not change how an HTML user agent would render the output. The default value is yes.

    The html output method should escape non-ASCII characters in URI attribute values using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation.

    The html output method may output a character using a character entity reference, if one is defined for it in the version of HTML that the output method is using.

    The html output method should terminate processing instructions with > rather than ?>.

    The html output method should output boolean attributes (that is attributes with only a single allowed value that is equal to the name of the attribute) in minimized form. For example, a start-tag written in the stylesheet as

    ]]>

    should be output as

    ]]>

    The html output method should not escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML 4.0 Recommendation). For example, a start-tag written in the stylesheet as

    ]]>

    should be output as

    ]]>

    The encoding attribute specifies the preferred encoding to be used. If there is a HEAD element, then the html output method should add a META element immediately after the start-tag of the HEAD element specifying the character encoding actually used. For example,

    ...]]>

    It is possible that the result tree will contain a character that cannot be represented in the encoding that the XSLT processor is using for output. In this case, if the character occurs in a context where HTML recognizes character references, then the character should be output as a character entity reference or decimal numeric character reference; otherwise (for example, in a script or style element or in a comment), the XSLT processor should signal an error.

    If the doctype-public or doctype-system attributes are specified, then the html output method should output a document type declaration immediately before the first element. The name following <!DOCTYPE should be HTML or html. If the doctype-public attribute is specified, then the output method should output PUBLIC followed by the specified public identifier; if the doctype-system attribute is also specified, it should also output the specified system identifier following the public identifier. If the doctype-system attribute is specified but the doctype-public attribute is not specified, then the output method should output SYSTEM followed by the specified system identifier.

    The media-type attribute is applicable for the html output method. The default value is text/html.

    Text Output Method

    The text output method outputs the result tree by outputting the string-value of every text node in the result tree in document order without any escaping.

    The media-type attribute is applicable for the text output method. The default value for the media-type attribute is text/plain.

    The encoding attribute identifies the encoding that the text output method should use to convert sequences of characters to sequences of bytes. The default is system-dependent. If the result tree contains a character that cannot be represented in the encoding that the XSLT processor is using for output, the XSLT processor should signal an error.

    Disabling Output Escaping

    Normally, the xml output method escapes & and < (and possibly other characters) when outputting text nodes. This ensures that the output is well-formed XML. However, it is sometimes convenient to be able to produce output that is almost, but not quite well-formed XML; for example, the output may include ill-formed sections which are intended to be transformed into well-formed XML by a subsequent non-XML aware process. For this reason, XSLT provides a mechanism for disabling output escaping. An xsl:value-of or xsl:text element may have a disable-output-escaping attribute; the allowed values are yes or no; the default is no; if the value is yes, then a text node generated by instantiating the xsl:value-of or xsl:text element should be output without any escaping. For example,

    <]]>

    should generate the single character <.

    It is an error for output escaping to be disabled for a text node that is used for something other than a text node in the result tree. Thus, it is an error to disable output escaping for an xsl:value-of or xsl:text element that is used to generate the string-value of a comment, processing instruction or attribute node; it is also an error to convert a result tree fragment to a number or a string if the result tree fragment contains a text node for which escaping was disabled. In both cases, an XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the disable-output-escaping attribute.

    The disable-output-escaping attribute may be used with the html output method as well as with the xml output method. The text output method ignores the disable-output-escaping attribute, since it does not perform any output escaping.

    An XSLT processor will only be able to disable output escaping if it controls how the result tree is output. This may not always be the case. For example, the result tree may be used as the source tree for another XSLT transformation instead of being output. An XSLT processor is not required to support disabling output escaping. If an xsl:value-of or xsl:text specifies that output escaping should be disabled and the XSLT processor does not support this, the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.

    If output escaping is disabled for a character that is not representable in the encoding that the XSLT processor is using for output, then the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.

    Since disabling output escaping may not work with all XSLT processors and can result in XML that is not well-formed, it should be used only when there is no alternative.

    Conformance

    A conforming XSLT processor must be able to use a stylesheet to transform a source tree into a result tree as specified in this document. A conforming XSLT processor need not be able to output the result in XML or in any other form.

    Vendors of XSLT processors are strongly encouraged to provide a way to verify that their processor is behaving conformingly by allowing the result tree to be output as XML or by providing access to the result tree through a standard API such as the DOM or SAX.

    A conforming XSLT processor must signal any errors except for those that this document specifically allows an XSLT processor not to signal. A conforming XSLT processor may but need not recover from any errors that it signals.

    A conforming XSLT processor may impose limits on the processing resources consumed by the processing of a stylesheet.

    Notation

    The specification of each XSLT-defined element type is preceded by a summary of its syntax in the form of a model for elements of that element type. The meaning of syntax summary notation is as follows:

    An attribute is required if and only if its name is in bold.

    The string that occurs in the place of an attribute value specifies the allowed values of the attribute. If this is surrounded by curly braces, then the attribute value is treated as an attribute value template, and the string occurring within curly braces specifies the allowed values of the result of instantiating the attribute value template. Alternative allowed values are separated by |. A quoted string indicates a value equal to that specific string. An unquoted, italicized name specifies a particular type of value.

    If the element is allowed not to be empty, then the element contains a comment specifying the allowed content. The allowed content is specified in a similar way to an element type declaration in XML; template means that any mixture of text nodes, literal result elements, extension elements, and XSLT elements from the instruction category is allowed; top-level-elements means that any mixture of XSLT elements from the top-level-element category is allowed.

    The element is prefaced by comments indicating if it belongs to the instruction category or top-level-element category or both. The category of an element just affects whether it is allowed in the content of elements that allow a template or top-level-elements.

    References Normative References World Wide Web Consortium. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3.org/TR/1998/REC-xml-19980210 World Wide Web Consortium. Namespaces in XML. W3C Recommendation. See http://www.w3.org/TR/REC-xml-names World Wide Web Consortium. XML Path Language. W3C Recommendation. See http://www.w3.org/TR/xpath Other References World Wide Web Consortium. Cascading Style Sheets, level 2 (CSS2). W3C Recommendation. See http://www.w3.org/TR/1998/REC-CSS2-19980512 International Organization for Standardization, International Electrotechnical Commission. ISO/IEC 10179:1996. Document Style Semantics and Specification Language (DSSSL). International Standard. World Wide Web Consortium. HTML 4.0 specification. W3C Recommendation. See http://www.w3.org/TR/REC-html40 Internet Assigned Numbers Authority. Character Sets. See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. N. Freed, J. Postel. IANA Charset Registration Procedures. IETF RFC 2278. See http://www.ietf.org/rfc/rfc2278.txt. E. Whitehead, M. Murata. XML Media Types. IETF RFC 2376. See http://www.ietf.org/rfc/rfc2376.txt. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax. IETF RFC 2396. See http://www.ietf.org/rfc/rfc2396.txt. Unicode Consortium. Unicode Technical Report #10. Unicode Collation Algorithm. Unicode Technical Report. See http://www.unicode.org/unicode/reports/tr10/index.html. World Wide Web Consortium. XHTML 1.0: The Extensible HyperText Markup Language. W3C Proposed Recommendation. See http://www.w3.org/TR/xhtml1 World Wide Web Consortium. XML Pointer Language (XPointer). W3C Working Draft. See http://www.w3.org/TR/xptr World Wide Web Consortium. Associating stylesheets with XML documents. W3C Recommendation. See http://www.w3.org/TR/xml-stylesheet World Wide Web Consortium. Extensible Stylesheet Language (XSL). W3C Working Draft. See http://www.w3.org/TR/WD-xsl Element Syntax Summary DTD Fragment for XSLT Stylesheets

    This DTD Fragment is not normative because XML 1.0 DTDs do not support XML Namespaces and thus cannot correctly describe the allowed structure of an XSLT stylesheet.

    The following entity can be used to construct a DTD for XSLT stylesheets that create instances of a particular result DTD. Before referencing the entity, the stylesheet DTD must define a result-elements parameter entity listing the allowed result element types. For example:

    ]]>

    Such result elements should be declared to have xsl:use-attribute-sets and xsl:extension-element-prefixes attributes. The following entity declares the result-element-atts parameter for this purpose. The content that XSLT allows for result elements is the same as it allows for the XSLT elements that are declared in the following entity with a content model of %template;. The DTD may use a more restrictive content model than %template; to reflect the constraints of the result DTD.

    The DTD may define the non-xsl-top-level parameter entity to allow additional top-level elements from namespaces other than the XSLT namespace.

    The use of the xsl: prefix in this DTD does not imply that XSLT stylesheets are required to use this prefix. Any of the elements declared in this DTD may have attributes whose name starts with xmlns: or is equal to xmlns in addition to the attributes declared in this DTD.

    &XSLT.ns; ]]>
    Examples Document Example

    This example is a stylesheet for transforming documents that conform to a simple DTD into XHTML . The DTD is:

    ]]>

    The stylesheet is:

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;" xmlns="&XHTML.ns;"> <xsl:value-of select="title"/>

    NOTE:

    ]]>

    With the following input document

    Document Title Chapter Title
    Section Title This is a test. This is a note.
    Another Section Title This is another test. This is another note.
    ]]>

    it would produce the following result

    <?xml version="1.0" encoding="iso-8859-1"?> <html xmlns="&XHTML.ns;"> Document Title

    Document Title

    Chapter Title

    Section Title

    This is a test.

    NOTE: This is a note.

    Another Section Title

    This is another test.

    NOTE: This is another note.

    ]]>
    Data Example

    This is an example of transforming some data represented in XML using three different XSLT stylesheets to produce three different representations of the data, HTML, SVG and VRML.

    The input data is:

    10 9 7 4 3 4 6 -1.5 2 ]]>

    The following stylesheet, which uses the simplified syntax described in , transforms the data into HTML:

    <html xsl:version="1.0" xmlns:xsl="&XSLT.ns;" Sales Results By Division
    Division Revenue Growth Bonus
    color:red
    ]]>

    The HTML output is:

    Sales Results By Division
    DivisionRevenueGrowthBonus
    North1097
    West6-1.52
    South434
    ]]>

    The following stylesheet transforms the data into SVG:

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;" Revenue Division ]]>

    The SVG output is:

    Revenue Division North 10 South 4 West 6 ]]>

    The following stylesheet transforms the data into VRML:

    <xsl:stylesheet version="1.0" xmlns:xsl="&XSLT.ns;"> #VRML V2.0 utf8 # externproto definition of a single bar element EXTERNPROTO bar [ field SFInt32 x field SFInt32 y field SFInt32 z field SFString name ] "http://www.vrml.org/WorkingGroups/dbwork/barProto.wrl" # inline containing the graph axes Inline { url "http://www.vrml.org/WorkingGroups/dbwork/barAxes.wrl" } bar { x y z name "" } ]]>

    The VRML output is:

    Acknowledgements

    The following have contributed to authoring this draft:

    Daniel Lipkin, Saba Jonathan Marsh, Microsoft Henry Thompson, University of Edinburgh Norman Walsh, Arbortext Steve Zilles, Adobe

    This specification was developed and approved for publication by the W3C XSL Working Group (WG). WG approval of this specification does not necessarily imply that all WG members voted for its approval. The current members of the XSL WG are:

    Sharon Adler IBM Co-Chair Anders Berglund IBM Perin Blanchard Novell Scott Boag Lotus Larry Cable Sun Jeff Caruso Bitstream James Clark Peter Danielsen Bell Labs Don Day IBM Stephen Deach Adobe Dwayne Dicks SoftQuad Andrew Greene Bitstream Paul Grosso Arbortext Eduardo Gutentag Sun Juliane Harbarth Software AG Mickey Kimchi Enigma Chris Lilley W3C Chris Maden Exemplary Technologies Jonathan Marsh Microsoft Alex Milowski Lexica Steve Muench Oracle Scott Parnell Xerox Vincent Quint W3C Dan Rapp Novell Gregg Reynolds Datalogics Jonathan Robie Software AG Mark Scardina Oracle Henry Thompson University of Edinburgh Philip Wadler Bell Labs Norman Walsh Arbortext Sanjiva Weerawarana IBM Steve Zilles Adobe Co-Chair
    Changes from Proposed Recommendation

    The following are the changes since the Proposed Recommendation:

    The xsl:version attribute is required on a literal result element used as a stylesheet (see ).

    The data-type attribute on xsl:sort can use a prefixed name to specify a data-type not defined by XSLT (see ).

    Features under Consideration for Future Versions of XSLT

    The following features are under consideration for versions of XSLT after XSLT 1.0:

    a conditional expression;

    support for XML Schema datatypes and archetypes;

    support for something like style rules in the original XSL submission;

    an attribute to control the default namespace for names occurring in XSLT attributes;

    support for entity references;

    support for DTDs in the data model;

    support for notations in the data model;

    a way to get back from an element to the elements that reference it (e.g. by IDREF attributes);

    an easier way to get an ID or key in another document;

    support for regular expressions for matching against any or all of text nodes, attribute values, attribute names, element type names;

    case-insensitive comparisons;

    normalization of strings before comparison, for example for compatibility characters;

    a function string resolve(node-set) function that treats the value of the argument as a relative URI and turns it into an absolute URI using the base URI of the node;

    multiple result documents;

    defaulting the select attribute on xsl:value-of to the current node;

    an attribute on xsl:attribute to control how the attribute value is normalized;

    additional attributes on xsl:sort to provide further control over sorting, such as relative order of scripts;

    a way to put the text of a resource identified by a URI into the result tree;

    allow unions in steps (e.g. foo/(bar|baz));

    allow for result tree fragments all operations that are allowed for node-sets;

    a way to group together consecutive nodes having duplicate subelements or attributes;

    features to make handling of the HTML style attribute more convenient.

    XML-XSLT-0.48/examples/XSLT.xsl0100644000076500007650000000243507420523607016247 0ustar jonathanjonathan <xsl:value-of select="/pod/head/title" />

    NAME

  •      
         
    XML-XSLT-0.48/examples/grammar2.xsl0100644000076500007650000000247307115344403017163 0ustar jonathanjonathan Example application of XML::XSLT

    Example application of XML::XSLT

    Extraction of grammar rules from Recommendations using 'Lazy' evaluation (all nodes of the trees are parsed and templates are searched for. Most likely very slow compared to the other version, which directly selects all prod nodes that are present in the document tree)
    [] ::=
    XML-XSLT-0.48/examples/grammar2.xml0100644000076500007650000046717507120110174017163 0ustar jonathanjonathan "> '"> amp, lt, gt, apos, quot"> ]>
    Extensible Markup Language (XML) 1.0 REC-xml-&iso6.doc.date; W3C Recommendation &draft.day;&draft.month;&draft.year; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps http://www.w3.org/TR/REC-xml http://www.w3.org/TR/PR-xml-971208 Tim Bray Textuality and Netscape tbray@textuality.com Jean Paoli Microsoft jeanpa@microsoft.com C. M. Sperberg-McQueen University of Illinois at Chicago cmsmcq@uic.edu

    The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

    This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

    This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. It is a product of the W3C XML Activity, details of which can be found at http://www.w3.org/XML. A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

    This specification uses the term URI, which is defined by , a work in progress expected to update and .

    The list of known errors in this specification is available at http://www.w3.org/XML/xml-19980210-errata.

    Please report errors in this document to xml-editor@w3.org.

    Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML Working Group, 1996, 1997.

    Created in electronic form.

    English Extended Backus-Naur Form (formal grammar) 1997-12-03 : CMSMcQ : yet further changes 1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997) 1997-12-02 : CMSMcQ : deal with as many corrections and comments from the proofreaders as possible: entify hard-coded document date in pubdate element, change expansion of entity WebSGML, update status description as per Dan Connolly (am not sure about refernece to Berners-Lee et al.), add 'The' to abstract as per WG decision, move Relationship to Existing Standards to back matter and combine with References, re-order back matter so normative appendices come first, re-tag back matter so informative appendices are tagged informdiv1, remove XXX XXX from list of 'normative' specs in prose, move some references from Other References to Normative References, add RFC 1738, 1808, and 2141 to Other References (they are not normative since we do not require the processor to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee et al.), move notation section to end of body, drop URIchar non-terminal and use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls', move reference to Aho et al. into appendix (Tim's right), add prose note saying that hash marks and fragment identifiers are NOT part of the URI formally speaking, and are NOT legal in system identifiers (processor 'may' signal an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle James's suggestion about < in attriubte values uppercase hex characters, namechar list, 1997-12-01 : JB : add some column-width parameters 1997-12-01 : CMSMcQ : begin round of changes to incorporate recent WG decisions and other corrections: binding sources of character encoding info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped line), drop SDD from EncodingDecl, change text at version number 1.0, drop misleading (wrong!) sentence about ignorables and extenders, modify definition of PCData to make bar on msc grammatical, change grammar's handling of internal subset (drop non-terminal markupdecls), change definition of includeSect to allow conditional sections, add integral-declaration constraint on internal subset, drop misleading / dangerous sentence about relationship of entities with system storage objects, change table body tag to htbody as per EM change to DTD, add rule about space normalization in public identifiers, add description of how to generate our name-space rules from Unicode character database (needs further work!). 1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance. 1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs, lotsa little edits for style 1997-09-25 : TB : Change to elm's new DTD, with substantial detail cleanup as a side-effect 1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents (thanks to Makoto Murata) Allow all empty elements to have end-tags, consistent with SGML TC (as per JJC). 1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce the term 'empty-element tag', note that all empty elements may use it, and elements declared EMPTY must use it. Add WFC requiring encoding decl to come first in an entity. Redefine notations to point to PIs as well as binary entities. Change autodetection table by removing bytes 3 and 4 from examples with Byte Order Mark. Add content model as a term and clarify that it applies to both mixed and element content. 1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to productions for choice, seq, Mixed, NotationType, Enumeration. Follow James Clark's suggestion and prohibit conditional sections in internal subset. TO DO: simplify production for ignored sections as a result, since we don't need to worry about parsers which don't expand PErefs finding a conditional section. 1997-06-29 : TB : various edits 1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments and some dead material. Revise occurrences of % in grammar to exploit Henry Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating to element content (?). 1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for draconian error handling (introduce the term Fatal Error). RE deleta est (changing wording from original announcement to restrict the requirement to validating parsers). Tag definition of validating processor and link to it. Add colon as name character. Change def of %operator. Change standard definitions of lt, gt, amp. Strip leading zeros from #x00nn forms. 1997-04-02 : CMSMcQ : final corrections of editorial errors found in last night's proofreading. Reverse course once more on well-formed: Webster's Second hyphenates it, and that's enough for me. 1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self 1997-03-31 : Tim Bray : many changes 1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations. Changed Ident element to accept def attribute. Allow normalization of Unicode characters. move def of systemliteral into section on literals. 1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso, and self. Among other things: give in on "well formed" (Terry is right), tentatively rename QuotedCData as AttValue and Literal as EntityValue to be more informative, since attribute values are the only place QuotedCData was used, and vice versa for entity text and Literal. (I'd call it Entity Text, but 8879 uses that name for both internal and external entities.) 1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except in the one case where it meant 'may or may not'. 1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver 1997-03-21 : CMSMcQ : correct as many reported errors as possible. 1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec. 1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for WWW conference April 1997: restore some of the internal entity references (e.g. to docdate, etc.), change character xA0 to &nbsp; and define nbsp as &#160;, and refill a lot of paragraphs for legibility. 1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames, Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section' as 'CDATA section' passim. Also edits from James Clark: Define the set of characters from which [^abc] subtracts. Charref should use just [0-9] not Digit. Location info needs cleaner treatment: remove? (ERB question). One example of a PI has wrong pic. Clarify discussion of encoding names. Encoding failure should lead to unspecified results; don't prescribe error recovery. Don't require exposure of entity boundaries. Ignore white space in element content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And some of my own: Correct productions for content model: model cannot consist of a name, so "elements ::= cp" is no good. 1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, for parameter entities. 1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names, characters. Add sections on parameter entities, conditional sections. Still to do: Add compatibility note on deterministic content models. Finish stylistic revision. 1996-10-31 : TB : Add Entity Handling section 1996-10-30 : TB : Clean up term & termdef. Slip in ERB decision re EMPTY. 1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions. Change comments back to //. Introduce language for XML namespace reservation. Add section on white-space handling. Lots more cleanup. 1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters are not integers. Comments are /* */ not //. Add bibliographic refs to 10646, HyTime, Unicode. Rename old Cdata as MsData since it's only seen in marked sections. Call them attribute-value pairs not name-value pairs, except once. Internal subset is optional, needs '?'. Implied attributes should be signaled to the app, not have values supplied by processor. 1996-10-16 : TB : track down & excise all DSD references; introduce some EBNF for entity declarations. 1996-10-?? : TB : consistency check, fix up scraps so they all parse, get formatter working, correct a few productions. 1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational changes: Replace a few literals with xmlpio and pic entities, to make them consistent and ensure we can change pic reliably when the ERB votes. Drop paragraph on recognizers from notation section. Add match, exact match to terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments, PIs, and marked sections in discussion of delimiter escaping. Streamline discussion of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl, and add section on partial-DTD summary PIs to end of Logical Structures section. Revise DSD syntax section to use Tim's subset-in-a-PI mechanism. 1996-10-10 : TB : eliminate name recognizers (and more?) 1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters) 1996-10-09 : CMSMcQ : re-unite everything for convenience, at least temporarily, and revise quickly 1996-10-08 : TB : first major homogenization pass 1996-10-08 : TB : turn "current" attribute on div type into CDATA 1996-10-02 : TB : remould into skeleton + entities 1996-09-30 : CMSMcQ : add a few more sections prior to exchange with Tim. 1996-09-20 : CMSMcQ : finish transcribing notes. 1996-09-19 : CMSMcQ : begin transcribing notes for draft. 1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping
    Introduction

    Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language . By construction, XML documents are conforming SGML documents.

    XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

    A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

    Origin and Goals

    XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.

    The design goals for XML are:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness in XML markup is of minimal importance.

    This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version &XML.version; and construct computer programs to process it.

    This version of the XML specification &doc.distribution;.

    Terminology

    The terminology used to describe XML documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of an XML processor:

    Conforming documents and XML processors are permitted to but need not behave as described.

    Conforming documents and XML processors are required to behave as described; otherwise they are in error.

    A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

    An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

    Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

    A rule which applies to all valid XML documents. Violations of validity constraints are errors; they must, at user option, be reported by validating XML processors.

    A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are fatal errors.

    (Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. At user option, processors may normalize such characters to some canonical form. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production. (Of content and content models:) An element matches its declaration when it conforms in the fashion described in the constraint .

    A feature of XML included solely to ensure that XML remains compatible with SGML.

    A non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the &WebSGML;.

    Documents

    A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document may in addition be valid if it meets certain further constraints.

    Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly, as described in .

    Well-Formed XML Documents

    A textual object is a well-formed XML document if:

    Taken as a whole, it matches the production labeled document.

    It meets all the well-formedness constraints given in this specification.

    Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

    Document document prolog element Misc*

    Matching the document production implies that:

    It contains one or more elements.

    There is exactly one element, called the root, or document element, no part of which appears in the content of any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

    As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.

    Characters

    A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified by ISO/IEC 10646 . Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. The use of "compatibility characters", as defined in section 6.8 of , is discouraged. Character Range Char #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.

    The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in .

    Common Syntactic Constructs

    This section defines some symbols used widely in the grammar.

    S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space S (#x20 | #x9 | #xD | #xA)+

    Characters are classified for convenience as letters, digits, or other characters. Letters consist of an alphabetic or syllabic base character possibly followed by one or more combining characters, or of an ideographic character. Full definitions of the specific characters in each class are given in .

    A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

    The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

    An Nmtoken (name token) is any mixture of name characters. Names and Tokens NameChar Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender Name (Letter | '_' | ':') (NameChar)* Names Name (S Name)* Nmtoken (NameChar)+ Nmtokens Nmtoken (S Nmtoken)*

    Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup. Literals EntityValue '"' ([^%&"] | PEReference | Reference)* '"' |  "'" ([^%&'] | PEReference | Reference)* "'" AttValue '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'" SystemLiteral ('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

    Character Data and Markup

    Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions.

    All text that is not markup constitutes the character data of the document.

    The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration; see . If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

    In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>".

    To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as "&quot;". Character Data CharData [^<&]* - ([^<&]* ']]>' [^<&]*)

    Comments

    Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments. Comments Comment '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

    An example of a comment: <!&como; declarations for <head> & <body> &comc;>

    Processing Instructions

    Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions PI '<?' PITarget (S (Char* - (Char* &pic; Char*)))? &pic; PITarget Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application to which the instruction is directed. The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification. The XML Notation mechanism may be used for formal declaration of PI targets.

    CDATA Sections

    CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>": CDATA Sections CDSect CDStart CData CDEnd CDStart '<![CDATA[' CData (Char* - (Char* ']]>' Char*)) CDEnd ']]>' Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

    An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]>

    Prolog and Document Type Declaration

    XML documents may, and should, begin with an XML declaration which specifies the version of XML being used. For example, the following is a complete XML document, well-formed but not valid: Hello, world! ]]> and so is this: Hello, world! ]]>

    The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error for a document to use the value "1.0" if it does not conform to this version of this specification. It is the intent of the XML working group to give later versions of this specification numbers other than "1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any are produced, to use any particular numbering scheme. Since future versions are not ruled out, this construct is provided as a means to allow the possibility of automatic version recognition, should it become necessary. Processors may signal an error if they receive documents labeled with versions they do not support.

    The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute-value pairs with its logical structures. XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.

    The document type declaration must appear before the first element in the document. Prolog prolog XMLDecl? Misc* (doctypedecl Misc*)? XMLDecl &xmlpio; VersionInfo EncodingDecl? SDDecl? S? &pic; VersionInfo S 'version' Eq (' VersionNum ' | " VersionNum ") Eq S? '=' S? VersionNum ([a-zA-Z0-9_.:] | '-')+ Misc Comment | PI | S

    The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together.

    A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration. These declarations may be contained in whole or in part within parameter entities, as described in the well-formedness and validity constraints below. For fuller information, see .

    Document Type Definition doctypedecl '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' markupdecl elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment

    The markup declarations may be made up in whole or in part of the replacement text of parameter entities. The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on) describe the declarations after all the parameter entities have been included.

    Root Element Type

    The Name in the document type declaration must match the element type of the root element.

    Proper Declaration/PE Nesting

    Parameter-entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration (markupdecl above) is contained in the replacement text for a parameter-entity reference, both must be contained in the same replacement text.

    PEs in Internal Subset

    In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

    Like the internal subset, the external subset and any external parameter entities referred to in the DTD must consist of a series of complete markup declarations of the types allowed by the non-terminal symbol markupdecl, interspersed with white space or parameter-entity references. However, portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset extSubset TextDecl? extSubsetDecl extSubsetDecl ( markupdecl | conditionalSect | PEReference | S )*

    The external subset and external parameter entities also differ from the internal subset in that in them, parameter-entity references are permitted within markup declarations, not only between markup declarations.

    An example of an XML document with a document type declaration: Hello, world! ]]> The system identifier "hello.dtd" gives the URI of a DTD for the document.

    The declarations can also be given locally, as in this example: ]> Hello, world! ]]> If both the external and internal subsets are used, the internal subset is considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

    Standalone Document Declaration

    Markup declarations can affect the content of the document, as passed from an XML processor to an application; examples are attribute defaults and entity declarations. The standalone document declaration, which may appear as a component of the XML declaration, signals whether or not there are such declarations which appear external to the document entity. Standalone Document Declaration SDDecl S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

    In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. The value "no" indicates that there are or may be such external markup declarations. Note that the standalone document declaration only denotes the presence of external declarations; the presence, in a document, of references to external entities, when those entities are internally declared, does not change its standalone status.

    If there are no external markup declarations, the standalone document declaration has no meaning. If there are external markup declarations but there is no standalone document declaration, the value "no" is assumed.

    Any XML document for which standalone="no" holds can be converted algorithmically to a standalone document, which may be desirable for some network delivery applications.

    Standalone Document Declaration

    The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:

    attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or

    entities (other than &magicents;), if references to those entities appear in the document, or

    attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or

    element types with element content, if white space occurs directly within any instance of those types.

    An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

    White Space Handling

    In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

    An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

    A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". For example:]]>

    The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance of the xml:space attribute.

    The root element of any document is considered to have signaled no intentions as regards application space handling, unless it provides a value for this attribute or the attribute is declared with a default value.

    End-of-Line Handling

    XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).

    To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can conveniently be produced by normalizing all line breaks to #xA on input, before parsing.)

    Language Identification

    In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by , "Tags for the Identification of Languages": Language Identification LanguageID Langcode ('-' Subcode)* Langcode ISO639Code | IanaCode | UserCode ISO639Code ([a-z] | [A-Z]) ([a-z] | [A-Z]) IanaCode ('i' | 'I') '-' ([a-z] | [A-Z])+ UserCode ('x' | 'X') '-' ([a-z] | [A-Z])+ Subcode ([a-z] | [A-Z])+ The Langcode may be any of the following:

    a two-letter language code as defined by , "Codes for the representation of names of languages"

    a language identifier registered with the Internet Assigned Numbers Authority ; these begin with the prefix "i-" (or "I-")

    a language identifier assigned by the user, or agreed on between parties in private use; these must begin with the prefix "x-" or "X-" in order to ensure that they do not conflict with names later standardized or registered with IANA

    There may be any number of Subcode segments; if the first subcode segment exists and the Subcode consists of two letters, then it must be a country code from , "Codes for the representation of names of countries." If the first subcode consists of more than two letters, it must be a subcode for the language in question registered with IANA, unless the Langcode begins with the prefix "x-" or "X-".

    It is customary to give the language code in lower case, and the country code (if any) in upper case. Note that these values, unlike other names in XML documents, are case insensitive.

    For example: The quick brown fox jumps over the lazy dog.

    What colour is it?

    What color is it?

    Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit heißem Bemüh'n. ]]>

    The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content.

    A simple declaration for xml:lang might take the form xml:lang NMTOKEN #IMPLIED but specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml:lang attribute might be declared this way: ]]>

    Logical Structures

    Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value.

    Element element EmptyElemTag | STag content ETag

    This specification does not constrain the semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.

    Element Type Match

    The Name in an element's end-tag must match the element type in the start-tag.

    Element Valid

    An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds:

    The declaration matches EMPTY and the element has no content.

    The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between each pair of child elements.

    The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model.

    The declaration matches ANY, and the types of any child elements have been declared.

    Start-Tags, End-Tags, and Empty-Element Tags

    The beginning of every non-empty XML element is marked by a start-tag. Start-tag STag '<' Name (S Attribute)* S? '>' Attribute Name Eq AttValue The Name in the start- and end-tags gives the element's type. The Name-AttValue pairs are referred to as the attribute specifications of the element, with the Name in each pair referred to as the attribute name and the content of the AttValue (the text between the ' or " delimiters) as the attribute value.

    Unique Att Spec

    No attribute name may appear more than once in the same start-tag or empty-element tag.

    Attribute Value Type

    The attribute must have been declared; the value must be of the type declared for it. (For attribute types, see .)

    No External Entity References

    Attribute values cannot contain direct or indirect entity references to external entities.

    No < in Attribute Values

    The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

    An example of a start-tag: <termdef id="dt-dog" term="dog">

    The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element's type as given in the start-tag: End-tag ETag '</' Name S? '>'

    An example of an end-tag:</termdef>

    The text between the start-tag and end-tag is called the element's content: Content of Elements content (element | CharData | Reference | CDSect | PI | Comment)*

    If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form: Tags for Empty Elements EmptyElemTag '<' Name (S Attribute)* S? '/>'

    Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY.

    Examples of empty elements: <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

    Element Type Declarations

    The element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content.

    Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.

    An element type declaration takes the form: Element Type Declaration elementdecl '<!ELEMENT' S Name S contentspec S? '>' contentspec 'EMPTY' | 'ANY' | Mixed | children where the Name gives the element type being declared.

    Unique Element Type Declaration

    No element type may be declared more than once.

    Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>

    Element Content

    An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles: Element-content Models children (choice | seq) ('?' | '*' | '+')? cp (Name | choice | seq) ('?' | '*' | '+')? choice '(' S? cp ( S? '|' S? cp )* S? ')' seq '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification.

    The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see .

    Proper Group/PE Nesting

    Parameter-entity replacement text must be properly nested with parenthetized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

    For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

    Examples of element-content models: <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

    Mixed Content

    An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements. In this case, the types of the child elements may be constrained, but not their order or their number of occurrences: Mixed-content Declaration Mixed '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' where the Names give the types of elements that may appear as children.

    No Duplicate Types

    The same name must not appear more than once in a single mixed-content declaration.

    Examples of mixed content declarations: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)>

    Attribute-List Declarations

    Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in . Attribute-list declarations may be used:

    To define the set of attributes pertaining to a given element type.

    To establish type constraints for these attributes.

    To provide default values for attributes.

    Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type: Attribute-list Declaration AttlistDecl '<!ATTLIST' S Name AttDef* S? '>' AttDef S Name S AttType S DefaultDecl The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.

    When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.

    Attribute Types

    XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints, as noted: Attribute Types AttType StringType | TokenizedType | EnumeratedType StringType 'CDATA' TokenizedType 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'

    ID

    Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

    One ID per Element Type

    No element type may have more than one ID attribute specified.

    ID Attribute Default

    An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

    IDREF

    Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

    Entity Name

    Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

    Name Token

    Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

    Enumerated attributes can take one of a list of values provided in the declaration. There are two kinds of enumerated types: Enumerated Attribute Types EnumeratedType NotationType | Enumeration NotationType 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' Enumeration '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

    Notation Attributes

    Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

    Enumeration

    Values of this type must match one of the Nmtoken tokens in the declaration.

    For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.

    Attribute Defaults

    An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults DefaultDecl '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)

    In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.

    Required Attribute

    If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

    Attribute Default Legal

    The declared default value must meet the lexical constraints of the declared attribute type.

    Fixed Attribute Default

    If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

    Examples of attribute-list declarations: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">

    Attribute-Value Normalization

    Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:

    a character reference is processed by appending the referenced character to the attribute value

    an entity reference is processed by recursively processing the replacement text of the entity

    a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity

    other characters are processed by appending them to the normalized value

    If the declared value is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

    All attributes for which no declaration has been read should be treated by a non-validating parser as if declared CDATA.

    Conditional Sections

    Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section conditionalSect includeSect | ignoreSect includeSect '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' ignoreSect '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' ignoreSectContents Ignore ('<![' ignoreSectContents ']]>' Ignore)* Ignore Char* - (Char* ('<![' | ']]>') Char*)

    Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.

    If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. Note that for reliable parsing, the contents of even ignored conditional sections must be read in order to detect nested conditional sections and ensure that the end of the outermost (ignored) conditional section is properly detected. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored.

    If the keyword of the conditional section is a parameter-entity reference, the parameter entity must be replaced by its content before the processor decides whether to include or ignore the conditional section.

    An example: <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

    Physical Structures

    An XML document may consist of one or many storage units. These are called entities; they all have content and are all (except for the document entity, see below, and the external DTD subset) identified by name. Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document.

    Entities may be either parsed or unparsed. A parsed entity's contents are referred to as its replacement text; this text is considered an integral part of the document.

    An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

    Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes.

    General entities are entities for use within the document content. In this specification, general entities are sometimes referred to with the unqualified term entity when this leads to no ambiguity. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts. Furthermore, they occupy different namespaces; a parameter entity and a general entity with the same name are two distinct entities.

    Character and Entity References

    A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices. Character Reference CharRef '&#' [0-9]+ ';' | '&hcro;' [0-9a-fA-F]+ ';' Legal Character

    Characters referred to using character references must match the production for Char.

    If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

    An entity reference refers to the content of a named entity. References to parsed general entities use ampersand (&) and semicolon (;) as delimiters. Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.

    Entity Reference Reference EntityRef | CharRef EntityRef '&' Name ';' PEReference '%' Name ';' Entity Declared

    In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: &magicents;. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

    Entity Declared

    In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities &magicents;, in the form specified in . The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Parsed Entity

    An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

    No Recursion

    A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

    In DTD

    Parameter-entity references may only appear in the DTD.

    Examples of character and entity references: Type <key>less-than</key> (&hcro;3C;) to save options. This document was prepared on &docdate; and is classified &security-level;.

    Example of a parameter-entity reference: %ISOLat2;]]>

    Entity Declarations

    Entities are declared thus: Entity Declaration EntityDecl GEDecl | PEDecl GEDecl '<!ENTITY' S Name S EntityDef S? '>' PEDecl '<!ENTITY' S '%' S Name S PEDef S? '>' EntityDef EntityValue | (ExternalID NDataDecl?) PEDef EntityValue | ExternalID The Name identifies the entity in an entity reference or, in the case of an unparsed entity, in the value of an ENTITY or ENTITIES attribute. If the same entity is declared more than once, the first declaration encountered is binding; at user option, an XML processor may issue a warning if entities are declared multiple times.

    Internal Entities

    If the entity definition is an EntityValue, the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration. Note that some processing of entity and character references in the literal entity value may be required to produce the correct replacement text: see .

    An internal entity is a parsed entity.

    Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

    External Entities

    If the entity is not internal, it is an external entity, declared as follows: External Entity Declaration ExternalID 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral NDataDecl S 'NDATA' S Name If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.

    Notation Declared

    The Name must match the declared name of a notation.

    The SystemLiteral is called the entity's system identifier. It is a URI, which may be used to retrieve the entity. Note that the hash mark (#) and fragment identifier frequently used with URIs are not, formally, part of the URI itself; an XML processor may signal an error if a fragment identifier is given as part of a system identifier. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element type defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URIs are relative to the location of the resource within which the entity declaration occurs. A URI might thus be relative to the document entity, to the entity containing the external DTD subset, or to some other external parameter entity.

    An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

    In addition to a system identifier, an external identifier may include a public identifier. An XML processor attempting to retrieve the entity's content may use the public identifier to try to generate an alternative URI. If the processor is unable to do so, it must use the URI specified in the system literal. Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.

    Examples of external entity declarations: <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

    Parsed Entities The Text Declaration

    External parsed entities may each begin with a text declaration. Text Declaration TextDecl &xmlpio; VersionInfo? EncodingDecl S? &pic;

    The text declaration must be provided literally, not by reference to a parsed entity. No text declaration may appear at any position other than the beginning of an external parsed entity.

    Well-Formed Parsed Entities

    The document entity is well-formed if it matches the production labeled document. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt. An external parameter entity is well-formed if it matches the production labeled extPE. Well-Formed External Parsed Entity extParsedEnt TextDecl? content extPE TextDecl? extSubsetDecl An internal general parsed entity is well-formed if its replacement text matches the production labeled content. All internal parameter entities are well-formed by definition.

    A consequence of well-formedness in entities is that the logical and physical structures in an XML document are properly nested; no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

    Character Encoding in Entities

    Each external parsed entity in an XML document may use a different encoding for its characters. All XML processors must be able to read entities in either UTF-8 or UTF-16.

    Entities encoded in UTF-16 must begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.

    Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration containing an encoding declaration: Encoding Declaration EncodingDecl S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) EncName [A-Za-z] ([A-Za-z0-9._] | '-')* Encoding name contains only Latin characters In the document entity, the encoding declaration is part of the XML declaration. The EncName is the name of the encoding used.

    In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority , other than those just listed, should be referred to using their registered names. Note that these registered names are defined to be case-insensitive, so processors wishing to match against them should do so in a case-insensitive way.

    In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

    It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process.

    Examples of encoding declarations: <?xml encoding='UTF-8'?> <?xml encoding='EUC-JP'?>

    XML Processor Treatment of Entities and References

    The table below summarizes the contexts in which character references, entity references, and invocations of unparsed entities might appear and the required behavior of an XML processor in each case. The labels in the leftmost column describe the recognition context:

    as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content.

    as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue.

    as a Name, not a reference, appearing either as the value of an attribute which has been declared as type ENTITY, or as one of the space-separated tokens in the value of an attribute which has been declared as type ENTITIES.

    as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue.

    as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue or AttValue.

    Entity Type Character Parameter Internal General External Parsed General Unparsed Reference in Content Not recognized Included Included if validating Forbidden Included Reference in Attribute Value Not recognized Included in literal Forbidden Forbidden Included Occurs as Attribute Value Not recognized Forbidden Forbidden Notify Not recognized Reference in EntityValue Included in literal Bypassed Bypassed Forbidden Included Reference in DTD Included as PE Forbidden Forbidden Forbidden Forbidden Not Recognized

    Outside the DTD, the % character has no special significance; thus, what would be parameter entity references in the DTD are not recognized as markup in content. Similarly, the names of unparsed entities are not recognized except when they appear in the value of an appropriately declared attribute.

    Included

    An entity is included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, except that the replacement text of entities used to escape markup delimiters (the entities &magicents;) is always treated as data. (The string "AT&amp;T;" expands to "AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter.) A character reference is included when the indicated character is processed in place of the reference itself.

    Included If Validating

    When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor must include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognized, but did not read, the entity.

    This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

    Forbidden

    The following are forbidden, and constitute fatal errors:

    the appearance of a reference to an unparsed entity.

    the appearance of any character or general-entity reference in the DTD except within an EntityValue or AttValue.

    a reference to an external entity in an attribute value.

    Included in Literal

    When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text is processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. For example, this is well-formed: ]]> while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

    Notify

    When the name of an unparsed entity appears as a token in the value of an attribute of declared type ENTITY or ENTITIES, a validating processor must inform the application of the system and public (if any) identifiers for both the entity and its associated notation.

    Bypassed

    When a general entity reference appears in the EntityValue in an entity declaration, it is bypassed and left as is.

    Included as PE

    Just as with external parsed entities, parameter entities need only be included if validating. When a parameter-entity reference is recognized in the DTD and included, its replacement text is enlarged by the attachment of one leading and one following space (#x20) character; the intent is to constrain the replacement text of parameter entities to contain an integral number of grammatical tokens in the DTD.

    Construction of Internal Entity Replacement Text

    In discussing the treatment of internal entities, it is useful to distinguish two forms of the entity's value. The literal entity value is the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. The replacement text is the content of the entity, after replacement of character references and parameter-entity references.

    The literal entity value as given in an internal entity declaration (EntityValue) may contain character, parameter-entity, and general-entity references. Such references must be contained entirely within the literal entity value. The actual replacement text that is included as described above must contain the replacement text of any parameter entities referred to, and must contain the character referred to, in place of any character references in the literal entity value; however, general-entity references must be left as-is, unexpanded. For example, given the following declarations: ]]> then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; The general-entity reference "&rights;" would be expanded should the reference "&book;" appear in the document's content or an attribute value.

    These simple rules may have complex interactions; for a detailed discussion of a difficult example, see .

    Predefined Entities

    Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (&magicents;) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be used to escape < and & when they occur in character data.

    All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them. If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character, as shown below. ]]> Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

    Notation Declarations

    Notations identify by name the format of unparsed entities, the format of elements which bear a notation attribute, or the application to which a processing instruction is addressed.

    Notation declarations provide a name for the notation, for use in entity and attribute-list declarations and in attribute specifications, and an external identifier for the notation which may allow an XML processor or its client application to locate a helper application capable of processing data in the given notation. Notation Declarations NotationDecl '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' PublicID 'PUBLIC' S PubidLiteral

    XML processors must provide applications with the name and external identifier(s) of any notation declared and referred to in an attribute value, attribute definition, or entity declaration. They may additionally resolve the external identifier into the system identifier, file name, or other information needed to allow the application to call a processor for data in the notation described. (It is not an error, however, for XML documents to declare and refer to notations for which notation-specific applications are not available on the system where the XML processor or application is running.)

    Document Entity

    The document entity serves as the root of the entity tree and a starting-point for an XML processor. This specification does not specify how the document entity is to be located by an XML processor; unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

    Conformance Validating and Non-Validating Processors

    Conforming XML processors fall into two classes: validating and non-validating.

    Validating and non-validating processors alike must report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.

    Validating processors must report violations of the constraints expressed by the declarations in the DTD, and failures to fulfill the validity constraints given in this specification. To accomplish this, validating XML processors must read and process the entire DTD and all external parsed entities referenced in the document.

    Non-validating processors are required to check only the document entity, including the entire internal DTD subset, for well-formedness. While they are not required to check the document for validity, they are required to process all the declarations they read in the internal DTD subset and in any parameter entity that they read, up to the first reference to a parameter entity that they do not read; that is to say, they must use the information in those declarations to normalize attribute values, include the replacement text of internal entities, and supply default attribute values. They must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations.

    Using XML Processors

    The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations. Less is required of a non-validating processor; it need not read any part of the document other than the document entity. This has two effects that may be important to users of XML processors:

    Certain well-formedness errors, specifically those that require reading external entities, may not be detected by a non-validating processor. Examples include the constraints entitled Entity Declared, Parsed Entity, and No Recursion, as well as some of the cases described as forbidden in .

    The information passed from the processor to the application may vary, depending on whether the processor reads parameter and external entities. For example, a non-validating processor may not normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities.

    For maximum reliability in interoperating between different XML processors, applications which use non-validating processors should not rely on any behaviors not required of such processors. Applications which require facilities such as the use of default attributes or internal entities which are declared in external entities should use validating XML processors.

    Notation

    The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression

    Symbols are written with an initial capital letter if they are defined by a regular expression, or with an initial lower case letter otherwise. Literal strings are quoted.

    Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

    where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

    matches any character with a value in the range(s) indicated (inclusive).

    matches any character with a value outside the range indicated.

    matches any character with a value not among the characters given.

    matches a literal string matching that given inside the double quotes.

    matches a literal string matching that given inside the single quotes.

    These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

    expression is treated as a unit and may be combined as described in this list.

    matches A or nothing; optional A.

    matches A followed by B.

    matches A or B but not both.

    matches any string that matches A but does not match B.

    matches one or more occurrences of A.

    matches zero or more occurrences of A.

    Other notations used in the productions are:

    comment.

    well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.

    validity constraint; this identifies by name a constraint on valid documents associated with a production.

    References Normative References (Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. IETF (Internet Engineering Task Force). RFC 1766: Tags for the Identification of Languages, ed. H. Alvestrand. 1995. (International Organization for Standardization). ISO 639:1988 (E). Code for the representation of names of languages. [Geneva]: International Organization for Standardization, 1988. (International Organization for Standardization). ISO 3166-1:1997 (E). Codes for the representation of names of countries and their subdivisions — Part 1: Country codes [Geneva]: International Organization for Standardization, 1997. ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7). The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996. Other References Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Reading: Addison-Wesley, 1986, rpt. corr. 1988. Berners-Lee, T., R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax and Semantics. 1997. (Work in progress; see updates to RFC1738.) Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Extended abstract in I. Simon, Hrsg., LATIN 1992, S. 97-98. Springer-Verlag, Berlin 1992. Full Version in Theoretical Computer Science 120: 197-213, 1993. Brüggemann-Klein, Anne, and Derick Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, Bericht 38, Oktober 1991. James Clark. Comparison of SGML and XML. See http://www.w3.org/TR/NOTE-sgml-xml-971215. IETF (Internet Engineering Task Force). RFC 1738: Uniform Resource Locators (URL), ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. IETF (Internet Engineering Task Force). RFC 1808: Relative Uniform Resource Locators, ed. R. Fielding. 1995. IETF (Internet Engineering Task Force). RFC 2141: URN Syntax, ed. R. Moats. 1997. ISO (International Organization for Standardization). ISO 8879:1986(E). Information processing — Text and Office Systems — Standard Generalized Markup Language (SGML). First edition — 1986-10-15. [Geneva]: International Organization for Standardization, 1986. ISO (International Organization for Standardization). ISO/IEC 10744-1992 (E). Information technology — Hypermedia/Time-based Structuring Language (HyTime). [Geneva]: International Organization for Standardization, 1992. Extended Facilities Annexe. [Geneva]: International Organization for Standardization, 1996. Character Classes

    Following the characteristics defined in the Unicode standard, characters are classed as base characters (among others, these contain the alphabetic characters of the Latin alphabet, without diacritics), ideographic characters, and combining characters (among others, this class contains most diacritics); these classes combine to form the class of letters. Digits and extenders are also distinguished. Characters Letter BaseChar | Ideographic BaseChar [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3] Ideographic [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] CombiningChar [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A Digit [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] Extender #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

    The character classes defined here can be derived from the Unicode character database as follows:

    Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

    Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

    Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names.

    Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.

    The following characters are treated as name-start characters rather than name characters, because the property file classifies them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.

    Characters #x20DD-#x20E0 are excluded (in accordance with Unicode, section 5.14).

    Character #x00B7 is classified as an extender, because the property list so identifies it.

    Character #x0387 is added as a name character, because #x00B7 is its canonical equivalent.

    Characters ':' and '_' are allowed as name-start characters.

    Characters '-' and '.' are allowed as name characters.

    XML and SGML

    XML is designed to be a subset of SGML, in that every valid XML document should also be a conformant SGML document. For a detailed comparison of the additional restrictions that XML places on documents beyond those of SGML, see .

    Expansion of Entity and Character References

    This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in .

    If the DTD contains the declaration An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).

    " > ]]> then the XML processor will recognize the character references when it parses the entity declaration, and resolve them before storing the following string as the value of the entity "example": An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

    ]]>
    A reference in the document to "&example;" will cause the text to be reparsed, at which time the start- and end-tags of the "p" element will be recognized and the three references will be recognized and expanded, resulting in a "p" element with the following content (all data, no delimiters or markup):

    A more complex example will illustrate the rules and their effects fully. In the following example, the line numbers are solely for reference. 2 4 5 ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. ]]> This produces the following:

    in line 4, the reference to character 37 is expanded immediately, and the parameter entity "xx" is stored in the symbol table with the value "%zz;". Since the replacement text is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since "zz" is not yet declared.)

    in line 5, the character reference "&#60;" is expanded immediately and the parameter entity "zz" is stored with the replacement text "<!ENTITY tricky "error-prone" >", which is a well-formed entity declaration.

    in line 6, the reference to "xx" is recognized, and the replacement text of "xx" (namely "%zz;") is parsed. The reference to "zz" is recognized in its turn, and its replacement text ("<!ENTITY tricky "error-prone" >") is parsed. The general entity "tricky" has now been declared, with the replacement text "error-prone".

    in line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the "test" element is the self-describing (and ungrammatical) string This sample shows a error-prone method.

    Deterministic Content Models

    For compatibility, it is required that content models in element type declarations be deterministic.

    SGML requires deterministic content models (it calls them "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

    For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser cannot know which b in the model is being matched without looking ahead to see which element follows the b. In this case, the two references to b can be collapsed into a single reference, making the model read (b, (c | d)). An initial b now clearly matches only a single name in the content model. The parser doesn't need to look ahead to see what follows; either c or d would be accepted.

    More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman . In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error.

    Algorithms exist which allow many but not all non-deterministic content models to be reduced automatically to equivalent deterministic models; see Brüggemann-Klein 1991 .

    Autodetection of Character Encodings

    The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself. Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying (external) information. We consider the first case first.

    Because each XML entity not in UTF-8 or UTF-16 format must begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the Byte Order Mark required of UTF-16 data streams is "#xFEFF".

    00 00 00 3C: UCS-4, big-endian machine (1234 order)

    3C 00 00 00: UCS-4, little-endian machine (4321 order)

    00 00 3C 00: UCS-4, unusual octet order (2143)

    00 3C 00 00: UCS-4, unusual octet order (3412)

    FE FF: UTF-16, big-endian

    FF FE: UTF-16, little-endian

    00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 00 3F 00: UTF-16, little-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the ASCII characters, the encoding declaration itself may be read reliably

    4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use)

    other: UTF-8 without an encoding declaration, or else the data stream is corrupt, fragmentary, or enclosed in a wrapper of some kind

    This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 from each other, or to distinguish the specific EBCDIC code page in use, and so on).

    Because the contents of the encoding declaration are restricted to ASCII characters, a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable.

    Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input.

    Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration. Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity.

    The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended.

    If an XML entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules.

    W3C XML Working Group

    This specification was prepared and approved for publication by the W3C XML Working Group (WG). WG approval of this specification does not necessarily imply that all WG members voted for its approval. The current and former members of the XML WG are:

    Jon Bosak, SunChair James ClarkTechnical Lead Tim Bray, Textuality and NetscapeXML Co-editor Jean Paoli, MicrosoftXML Co-editor C. M. Sperberg-McQueen, U. of Ill.XML Co-editor Dan Connolly, W3CW3C Liaison Paula Angerstein, Texcel Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, ISOGEN Eve Maler, ArborText Tom Magliery, NCSA Murray Maloney, Muzmo and Grif Makoto Murata, Fuji Xerox Information Systems Joel Nava, Adobe Conleth O'Connell, Vignette Peter Sharpe, SoftQuad John Tigue, DataChannel
    XML-XSLT-0.48/examples/XSLT.html0100644000076500007650000004163107421365427016413 0ustar jonathanjonathanContent-Type: text/xml Content-Length: 17259 XML::XSLT - A perl module for processing XSLT

    NAME

    XML::XSLT - A perl module for processing XSLT

    SYNOPSIS

    use XML::XSLT;
    
    my $xslt = XML::XSLT->new ($xsl, warnings => 1);
    
    $xslt->transform ($xmlfile);
    
    print $xslt->toString;
    
    $xslt->dispose();
    
        

    DESCRIPTION

    This module implements the W3C's XSLT specification. The goal is full implementation of this spec, but we have not yet achieved that. However, it already works well. See XML::XSLT Commands for the current status of each command.

    XML::XSLT makes use of XML::DOM and LWP::Simple, while XML::DOM uses XML::Parser. Therefore XML::Parser, XML::DOM and LWP::Simple have to be installed properly for XML::XSLT to run.

    Specifying Sources

    The stylesheets and the documents may be passed as filenames, file handles regular strings, string references or DOM-trees. Functions that require sources (e.g. new), will accept either a named parameter or simply the argument.

    Either of the following are allowed:

    my $xslt = XML::XSLT->new($xsl);
    my $xslt = XML::XSLT->new(Source => $xsl);
    
        

    In documentation, the named parameter `Source' is always shown, but it is never required.

    METHODS

    XML::XSLT Commands

    SUPPORT

    General information, bug reporting tools, the latest version, mailing lists, etc. can be found at the XML::XSLT homepage:

    http://xmlxslt.sourceforge.net/
    
        

    DEPRECATIONS

    Methods and interfaces from previous versions that are not documented in this version are deprecated. Each of these deprecations can still be used but will produce a warning when the deprecation is first used. You can use the old interfaces without warnings by passing new() the flag use_deprecated . Example:

    $parser = XML::XSLT->new($xsl, "FILE",
                             use_deprecated => 1);
    
        

    The deprecated methods will disappear by the time a 1.0 release is made.

    The deprecated methods are :

    BUGS

    Yes.

    HISTORY

    Geert Josten and Egon Willighagen developed and maintained XML::XSLT up to version 0.22. At that point, Mark Hershberger started moving the project to Sourceforge and began working on it with Bron Gondwana.

    LICENCE

    Copyright (c) 1999 Geert Josten & Egon Willighagen. All Rights Reserverd. This module is free software, and may be distributed under the same terms and conditions as Perl.

    AUTHORS

    Geert Josten <gjosten@sci.kun.nl>

    Egon Willighagen <egonw@sci.kun.nl>

    Mark A. Hershberger <mah@everybody.org>

    Bron Gondwana <perlcode@brong.net>

    Jonathan Stowe <jns@gellyfish.com>

    SEE ALSO

    XML::DOM , LWP::Simple , XML::Parser

    XML-XSLT-0.48/examples/test.xml0100644000076500007650000000401607115344403016417 0ustar jonathanjonathan hoi piepeloi! Dit is wat test tekst... Nieuwjaarsborrel 4/1/1999 Subfaculteit Scheikunde kantine B-faculteit 16.30 Informed Chemistry: what can it do for synthesis? 13/1/1999 Chemweb.Com Internet 16.00 "Nieuwe materialen op basis van organische synthese" 2/2/1999 NSR Spreker: dr. Frank van Veggel, Laboratorium voor organische chemie, Universiteit Twente
    Gastheer: Prof. dr. RJM Nolte
    CZ I 14.00
    Paaslympics 5/4/1999 St. Beet en BeeVee W en N Paas-Beestborrel 6/4/1999 BBB en Leonardo X-Files: Fight the Future 6/4/1999 St. Beet CZ N2 19.30u 1,50 Geert Josten!?!?
    XML-XSLT-0.48/examples/bernhard.xsl0100644000076500007650000000060607115344402017233 0ustar jonathanjonathan
    Belastungsgebiet(e):

    XML-XSLT-0.48/examples/identity.xsl_org0100644000076500007650000000051607115344403020147 0ustar jonathanjonathan XML-XSLT-0.48/examples/REC-xml-19980210.xml0100644000076500007650000046717507115344351017607 0ustar jonathanjonathan "> '"> amp, lt, gt, apos, quot"> ]>
    Extensible Markup Language (XML) 1.0 REC-xml-&iso6.doc.date; W3C Recommendation &draft.day;&draft.month;&draft.year; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps http://www.w3.org/TR/REC-xml http://www.w3.org/TR/PR-xml-971208 Tim Bray Textuality and Netscape tbray@textuality.com Jean Paoli Microsoft jeanpa@microsoft.com C. M. Sperberg-McQueen University of Illinois at Chicago cmsmcq@uic.edu

    The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

    This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

    This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. It is a product of the W3C XML Activity, details of which can be found at http://www.w3.org/XML. A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

    This specification uses the term URI, which is defined by , a work in progress expected to update and .

    The list of known errors in this specification is available at http://www.w3.org/XML/xml-19980210-errata.

    Please report errors in this document to xml-editor@w3.org.

    Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML Working Group, 1996, 1997.

    Created in electronic form.

    English Extended Backus-Naur Form (formal grammar) 1997-12-03 : CMSMcQ : yet further changes 1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997) 1997-12-02 : CMSMcQ : deal with as many corrections and comments from the proofreaders as possible: entify hard-coded document date in pubdate element, change expansion of entity WebSGML, update status description as per Dan Connolly (am not sure about refernece to Berners-Lee et al.), add 'The' to abstract as per WG decision, move Relationship to Existing Standards to back matter and combine with References, re-order back matter so normative appendices come first, re-tag back matter so informative appendices are tagged informdiv1, remove XXX XXX from list of 'normative' specs in prose, move some references from Other References to Normative References, add RFC 1738, 1808, and 2141 to Other References (they are not normative since we do not require the processor to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee et al.), move notation section to end of body, drop URIchar non-terminal and use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls', move reference to Aho et al. into appendix (Tim's right), add prose note saying that hash marks and fragment identifiers are NOT part of the URI formally speaking, and are NOT legal in system identifiers (processor 'may' signal an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle James's suggestion about < in attriubte values uppercase hex characters, namechar list, 1997-12-01 : JB : add some column-width parameters 1997-12-01 : CMSMcQ : begin round of changes to incorporate recent WG decisions and other corrections: binding sources of character encoding info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped line), drop SDD from EncodingDecl, change text at version number 1.0, drop misleading (wrong!) sentence about ignorables and extenders, modify definition of PCData to make bar on msc grammatical, change grammar's handling of internal subset (drop non-terminal markupdecls), change definition of includeSect to allow conditional sections, add integral-declaration constraint on internal subset, drop misleading / dangerous sentence about relationship of entities with system storage objects, change table body tag to htbody as per EM change to DTD, add rule about space normalization in public identifiers, add description of how to generate our name-space rules from Unicode character database (needs further work!). 1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance. 1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs, lotsa little edits for style 1997-09-25 : TB : Change to elm's new DTD, with substantial detail cleanup as a side-effect 1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents (thanks to Makoto Murata) Allow all empty elements to have end-tags, consistent with SGML TC (as per JJC). 1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce the term 'empty-element tag', note that all empty elements may use it, and elements declared EMPTY must use it. Add WFC requiring encoding decl to come first in an entity. Redefine notations to point to PIs as well as binary entities. Change autodetection table by removing bytes 3 and 4 from examples with Byte Order Mark. Add content model as a term and clarify that it applies to both mixed and element content. 1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to productions for choice, seq, Mixed, NotationType, Enumeration. Follow James Clark's suggestion and prohibit conditional sections in internal subset. TO DO: simplify production for ignored sections as a result, since we don't need to worry about parsers which don't expand PErefs finding a conditional section. 1997-06-29 : TB : various edits 1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments and some dead material. Revise occurrences of % in grammar to exploit Henry Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating to element content (?). 1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for draconian error handling (introduce the term Fatal Error). RE deleta est (changing wording from original announcement to restrict the requirement to validating parsers). Tag definition of validating processor and link to it. Add colon as name character. Change def of %operator. Change standard definitions of lt, gt, amp. Strip leading zeros from #x00nn forms. 1997-04-02 : CMSMcQ : final corrections of editorial errors found in last night's proofreading. Reverse course once more on well-formed: Webster's Second hyphenates it, and that's enough for me. 1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self 1997-03-31 : Tim Bray : many changes 1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations. Changed Ident element to accept def attribute. Allow normalization of Unicode characters. move def of systemliteral into section on literals. 1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso, and self. Among other things: give in on "well formed" (Terry is right), tentatively rename QuotedCData as AttValue and Literal as EntityValue to be more informative, since attribute values are the only place QuotedCData was used, and vice versa for entity text and Literal. (I'd call it Entity Text, but 8879 uses that name for both internal and external entities.) 1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except in the one case where it meant 'may or may not'. 1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver 1997-03-21 : CMSMcQ : correct as many reported errors as possible. 1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec. 1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for WWW conference April 1997: restore some of the internal entity references (e.g. to docdate, etc.), change character xA0 to &nbsp; and define nbsp as &#160;, and refill a lot of paragraphs for legibility. 1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames, Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section' as 'CDATA section' passim. Also edits from James Clark: Define the set of characters from which [^abc] subtracts. Charref should use just [0-9] not Digit. Location info needs cleaner treatment: remove? (ERB question). One example of a PI has wrong pic. Clarify discussion of encoding names. Encoding failure should lead to unspecified results; don't prescribe error recovery. Don't require exposure of entity boundaries. Ignore white space in element content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And some of my own: Correct productions for content model: model cannot consist of a name, so "elements ::= cp" is no good. 1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, for parameter entities. 1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names, characters. Add sections on parameter entities, conditional sections. Still to do: Add compatibility note on deterministic content models. Finish stylistic revision. 1996-10-31 : TB : Add Entity Handling section 1996-10-30 : TB : Clean up term & termdef. Slip in ERB decision re EMPTY. 1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions. Change comments back to //. Introduce language for XML namespace reservation. Add section on white-space handling. Lots more cleanup. 1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters are not integers. Comments are /* */ not //. Add bibliographic refs to 10646, HyTime, Unicode. Rename old Cdata as MsData since it's only seen in marked sections. Call them attribute-value pairs not name-value pairs, except once. Internal subset is optional, needs '?'. Implied attributes should be signaled to the app, not have values supplied by processor. 1996-10-16 : TB : track down & excise all DSD references; introduce some EBNF for entity declarations. 1996-10-?? : TB : consistency check, fix up scraps so they all parse, get formatter working, correct a few productions. 1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational changes: Replace a few literals with xmlpio and pic entities, to make them consistent and ensure we can change pic reliably when the ERB votes. Drop paragraph on recognizers from notation section. Add match, exact match to terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments, PIs, and marked sections in discussion of delimiter escaping. Streamline discussion of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl, and add section on partial-DTD summary PIs to end of Logical Structures section. Revise DSD syntax section to use Tim's subset-in-a-PI mechanism. 1996-10-10 : TB : eliminate name recognizers (and more?) 1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters) 1996-10-09 : CMSMcQ : re-unite everything for convenience, at least temporarily, and revise quickly 1996-10-08 : TB : first major homogenization pass 1996-10-08 : TB : turn "current" attribute on div type into CDATA 1996-10-02 : TB : remould into skeleton + entities 1996-09-30 : CMSMcQ : add a few more sections prior to exchange with Tim. 1996-09-20 : CMSMcQ : finish transcribing notes. 1996-09-19 : CMSMcQ : begin transcribing notes for draft. 1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping
    Introduction

    Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language . By construction, XML documents are conforming SGML documents.

    XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

    A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

    Origin and Goals

    XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.

    The design goals for XML are:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness in XML markup is of minimal importance.

    This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version &XML.version; and construct computer programs to process it.

    This version of the XML specification &doc.distribution;.

    Terminology

    The terminology used to describe XML documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of an XML processor:

    Conforming documents and XML processors are permitted to but need not behave as described.

    Conforming documents and XML processors are required to behave as described; otherwise they are in error.

    A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

    An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

    Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

    A rule which applies to all valid XML documents. Violations of validity constraints are errors; they must, at user option, be reported by validating XML processors.

    A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are fatal errors.

    (Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. At user option, processors may normalize such characters to some canonical form. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production. (Of content and content models:) An element matches its declaration when it conforms in the fashion described in the constraint .

    A feature of XML included solely to ensure that XML remains compatible with SGML.

    A non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the &WebSGML;.

    Documents

    A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document may in addition be valid if it meets certain further constraints.

    Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly, as described in .

    Well-Formed XML Documents

    A textual object is a well-formed XML document if:

    Taken as a whole, it matches the production labeled document.

    It meets all the well-formedness constraints given in this specification.

    Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

    Document document prolog element Misc*

    Matching the document production implies that:

    It contains one or more elements.

    There is exactly one element, called the root, or document element, no part of which appears in the content of any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

    As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.

    Characters

    A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified by ISO/IEC 10646 . Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. The use of "compatibility characters", as defined in section 6.8 of , is discouraged. Character Range Char #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.

    The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in .

    Common Syntactic Constructs

    This section defines some symbols used widely in the grammar.

    S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space S (#x20 | #x9 | #xD | #xA)+

    Characters are classified for convenience as letters, digits, or other characters. Letters consist of an alphabetic or syllabic base character possibly followed by one or more combining characters, or of an ideographic character. Full definitions of the specific characters in each class are given in .

    A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

    The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

    An Nmtoken (name token) is any mixture of name characters. Names and Tokens NameChar Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender Name (Letter | '_' | ':') (NameChar)* Names Name (S Name)* Nmtoken (NameChar)+ Nmtokens Nmtoken (S Nmtoken)*

    Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup. Literals EntityValue '"' ([^%&"] | PEReference | Reference)* '"' |  "'" ([^%&'] | PEReference | Reference)* "'" AttValue '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'" SystemLiteral ('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

    Character Data and Markup

    Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions.

    All text that is not markup constitutes the character data of the document.

    The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration; see . If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

    In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>".

    To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as "&quot;". Character Data CharData [^<&]* - ([^<&]* ']]>' [^<&]*)

    Comments

    Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments. Comments Comment '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

    An example of a comment: <!&como; declarations for <head> & <body> &comc;>

    Processing Instructions

    Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions PI '<?' PITarget (S (Char* - (Char* &pic; Char*)))? &pic; PITarget Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application to which the instruction is directed. The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification. The XML Notation mechanism may be used for formal declaration of PI targets.

    CDATA Sections

    CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>": CDATA Sections CDSect CDStart CData CDEnd CDStart '<![CDATA[' CData (Char* - (Char* ']]>' Char*)) CDEnd ']]>' Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

    An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]>

    Prolog and Document Type Declaration

    XML documents may, and should, begin with an XML declaration which specifies the version of XML being used. For example, the following is a complete XML document, well-formed but not valid: Hello, world! ]]> and so is this: Hello, world! ]]>

    The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error for a document to use the value "1.0" if it does not conform to this version of this specification. It is the intent of the XML working group to give later versions of this specification numbers other than "1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any are produced, to use any particular numbering scheme. Since future versions are not ruled out, this construct is provided as a means to allow the possibility of automatic version recognition, should it become necessary. Processors may signal an error if they receive documents labeled with versions they do not support.

    The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute-value pairs with its logical structures. XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.

    The document type declaration must appear before the first element in the document. Prolog prolog XMLDecl? Misc* (doctypedecl Misc*)? XMLDecl &xmlpio; VersionInfo EncodingDecl? SDDecl? S? &pic; VersionInfo S 'version' Eq (' VersionNum ' | " VersionNum ") Eq S? '=' S? VersionNum ([a-zA-Z0-9_.:] | '-')+ Misc Comment | PI | S

    The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together.

    A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration. These declarations may be contained in whole or in part within parameter entities, as described in the well-formedness and validity constraints below. For fuller information, see .

    Document Type Definition doctypedecl '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' markupdecl elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment

    The markup declarations may be made up in whole or in part of the replacement text of parameter entities. The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on) describe the declarations after all the parameter entities have been included.

    Root Element Type

    The Name in the document type declaration must match the element type of the root element.

    Proper Declaration/PE Nesting

    Parameter-entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration (markupdecl above) is contained in the replacement text for a parameter-entity reference, both must be contained in the same replacement text.

    PEs in Internal Subset

    In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

    Like the internal subset, the external subset and any external parameter entities referred to in the DTD must consist of a series of complete markup declarations of the types allowed by the non-terminal symbol markupdecl, interspersed with white space or parameter-entity references. However, portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset extSubset TextDecl? extSubsetDecl extSubsetDecl ( markupdecl | conditionalSect | PEReference | S )*

    The external subset and external parameter entities also differ from the internal subset in that in them, parameter-entity references are permitted within markup declarations, not only between markup declarations.

    An example of an XML document with a document type declaration: Hello, world! ]]> The system identifier "hello.dtd" gives the URI of a DTD for the document.

    The declarations can also be given locally, as in this example: ]> Hello, world! ]]> If both the external and internal subsets are used, the internal subset is considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

    Standalone Document Declaration

    Markup declarations can affect the content of the document, as passed from an XML processor to an application; examples are attribute defaults and entity declarations. The standalone document declaration, which may appear as a component of the XML declaration, signals whether or not there are such declarations which appear external to the document entity. Standalone Document Declaration SDDecl S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

    In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. The value "no" indicates that there are or may be such external markup declarations. Note that the standalone document declaration only denotes the presence of external declarations; the presence, in a document, of references to external entities, when those entities are internally declared, does not change its standalone status.

    If there are no external markup declarations, the standalone document declaration has no meaning. If there are external markup declarations but there is no standalone document declaration, the value "no" is assumed.

    Any XML document for which standalone="no" holds can be converted algorithmically to a standalone document, which may be desirable for some network delivery applications.

    Standalone Document Declaration

    The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:

    attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or

    entities (other than &magicents;), if references to those entities appear in the document, or

    attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or

    element types with element content, if white space occurs directly within any instance of those types.

    An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

    White Space Handling

    In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

    An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

    A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". For example:]]>

    The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance of the xml:space attribute.

    The root element of any document is considered to have signaled no intentions as regards application space handling, unless it provides a value for this attribute or the attribute is declared with a default value.

    End-of-Line Handling

    XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).

    To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can conveniently be produced by normalizing all line breaks to #xA on input, before parsing.)

    Language Identification

    In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by , "Tags for the Identification of Languages": Language Identification LanguageID Langcode ('-' Subcode)* Langcode ISO639Code | IanaCode | UserCode ISO639Code ([a-z] | [A-Z]) ([a-z] | [A-Z]) IanaCode ('i' | 'I') '-' ([a-z] | [A-Z])+ UserCode ('x' | 'X') '-' ([a-z] | [A-Z])+ Subcode ([a-z] | [A-Z])+ The Langcode may be any of the following:

    a two-letter language code as defined by , "Codes for the representation of names of languages"

    a language identifier registered with the Internet Assigned Numbers Authority ; these begin with the prefix "i-" (or "I-")

    a language identifier assigned by the user, or agreed on between parties in private use; these must begin with the prefix "x-" or "X-" in order to ensure that they do not conflict with names later standardized or registered with IANA

    There may be any number of Subcode segments; if the first subcode segment exists and the Subcode consists of two letters, then it must be a country code from , "Codes for the representation of names of countries." If the first subcode consists of more than two letters, it must be a subcode for the language in question registered with IANA, unless the Langcode begins with the prefix "x-" or "X-".

    It is customary to give the language code in lower case, and the country code (if any) in upper case. Note that these values, unlike other names in XML documents, are case insensitive.

    For example: The quick brown fox jumps over the lazy dog.

    What colour is it?

    What color is it?

    Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit heißem Bemüh'n. ]]>

    The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content.

    A simple declaration for xml:lang might take the form xml:lang NMTOKEN #IMPLIED but specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml:lang attribute might be declared this way: ]]>

    Logical Structures

    Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value.

    Element element EmptyElemTag | STag content ETag

    This specification does not constrain the semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.

    Element Type Match

    The Name in an element's end-tag must match the element type in the start-tag.

    Element Valid

    An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds:

    The declaration matches EMPTY and the element has no content.

    The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between each pair of child elements.

    The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model.

    The declaration matches ANY, and the types of any child elements have been declared.

    Start-Tags, End-Tags, and Empty-Element Tags

    The beginning of every non-empty XML element is marked by a start-tag. Start-tag STag '<' Name (S Attribute)* S? '>' Attribute Name Eq AttValue The Name in the start- and end-tags gives the element's type. The Name-AttValue pairs are referred to as the attribute specifications of the element, with the Name in each pair referred to as the attribute name and the content of the AttValue (the text between the ' or " delimiters) as the attribute value.

    Unique Att Spec

    No attribute name may appear more than once in the same start-tag or empty-element tag.

    Attribute Value Type

    The attribute must have been declared; the value must be of the type declared for it. (For attribute types, see .)

    No External Entity References

    Attribute values cannot contain direct or indirect entity references to external entities.

    No < in Attribute Values

    The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

    An example of a start-tag: <termdef id="dt-dog" term="dog">

    The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element's type as given in the start-tag: End-tag ETag '</' Name S? '>'

    An example of an end-tag:</termdef>

    The text between the start-tag and end-tag is called the element's content: Content of Elements content (element | CharData | Reference | CDSect | PI | Comment)*

    If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form: Tags for Empty Elements EmptyElemTag '<' Name (S Attribute)* S? '/>'

    Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY.

    Examples of empty elements: <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

    Element Type Declarations

    The element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content.

    Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.

    An element type declaration takes the form: Element Type Declaration elementdecl '<!ELEMENT' S Name S contentspec S? '>' contentspec 'EMPTY' | 'ANY' | Mixed | children where the Name gives the element type being declared.

    Unique Element Type Declaration

    No element type may be declared more than once.

    Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>

    Element Content

    An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles: Element-content Models children (choice | seq) ('?' | '*' | '+')? cp (Name | choice | seq) ('?' | '*' | '+')? choice '(' S? cp ( S? '|' S? cp )* S? ')' seq '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification.

    The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see .

    Proper Group/PE Nesting

    Parameter-entity replacement text must be properly nested with parenthetized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

    For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

    Examples of element-content models: <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

    Mixed Content

    An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements. In this case, the types of the child elements may be constrained, but not their order or their number of occurrences: Mixed-content Declaration Mixed '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' where the Names give the types of elements that may appear as children.

    No Duplicate Types

    The same name must not appear more than once in a single mixed-content declaration.

    Examples of mixed content declarations: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)>

    Attribute-List Declarations

    Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in . Attribute-list declarations may be used:

    To define the set of attributes pertaining to a given element type.

    To establish type constraints for these attributes.

    To provide default values for attributes.

    Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type: Attribute-list Declaration AttlistDecl '<!ATTLIST' S Name AttDef* S? '>' AttDef S Name S AttType S DefaultDecl The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.

    When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.

    Attribute Types

    XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints, as noted: Attribute Types AttType StringType | TokenizedType | EnumeratedType StringType 'CDATA' TokenizedType 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'

    ID

    Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

    One ID per Element Type

    No element type may have more than one ID attribute specified.

    ID Attribute Default

    An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

    IDREF

    Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

    Entity Name

    Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

    Name Token

    Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

    Enumerated attributes can take one of a list of values provided in the declaration. There are two kinds of enumerated types: Enumerated Attribute Types EnumeratedType NotationType | Enumeration NotationType 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' Enumeration '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

    Notation Attributes

    Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

    Enumeration

    Values of this type must match one of the Nmtoken tokens in the declaration.

    For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.

    Attribute Defaults

    An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults DefaultDecl '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)

    In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.

    Required Attribute

    If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

    Attribute Default Legal

    The declared default value must meet the lexical constraints of the declared attribute type.

    Fixed Attribute Default

    If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

    Examples of attribute-list declarations: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">

    Attribute-Value Normalization

    Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:

    a character reference is processed by appending the referenced character to the attribute value

    an entity reference is processed by recursively processing the replacement text of the entity

    a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity

    other characters are processed by appending them to the normalized value

    If the declared value is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

    All attributes for which no declaration has been read should be treated by a non-validating parser as if declared CDATA.

    Conditional Sections

    Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section conditionalSect includeSect | ignoreSect includeSect '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' ignoreSect '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' ignoreSectContents Ignore ('<![' ignoreSectContents ']]>' Ignore)* Ignore Char* - (Char* ('<![' | ']]>') Char*)

    Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.

    If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. Note that for reliable parsing, the contents of even ignored conditional sections must be read in order to detect nested conditional sections and ensure that the end of the outermost (ignored) conditional section is properly detected. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored.

    If the keyword of the conditional section is a parameter-entity reference, the parameter entity must be replaced by its content before the processor decides whether to include or ignore the conditional section.

    An example: <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

    Physical Structures

    An XML document may consist of one or many storage units. These are called entities; they all have content and are all (except for the document entity, see below, and the external DTD subset) identified by name. Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document.

    Entities may be either parsed or unparsed. A parsed entity's contents are referred to as its replacement text; this text is considered an integral part of the document.

    An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

    Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes.

    General entities are entities for use within the document content. In this specification, general entities are sometimes referred to with the unqualified term entity when this leads to no ambiguity. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts. Furthermore, they occupy different namespaces; a parameter entity and a general entity with the same name are two distinct entities.

    Character and Entity References

    A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices. Character Reference CharRef '&#' [0-9]+ ';' | '&hcro;' [0-9a-fA-F]+ ';' Legal Character

    Characters referred to using character references must match the production for Char.

    If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

    An entity reference refers to the content of a named entity. References to parsed general entities use ampersand (&) and semicolon (;) as delimiters. Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.

    Entity Reference Reference EntityRef | CharRef EntityRef '&' Name ';' PEReference '%' Name ';' Entity Declared

    In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: &magicents;. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

    Entity Declared

    In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities &magicents;, in the form specified in . The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Parsed Entity

    An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

    No Recursion

    A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

    In DTD

    Parameter-entity references may only appear in the DTD.

    Examples of character and entity references: Type <key>less-than</key> (&hcro;3C;) to save options. This document was prepared on &docdate; and is classified &security-level;.

    Example of a parameter-entity reference: %ISOLat2;]]>

    Entity Declarations

    Entities are declared thus: Entity Declaration EntityDecl GEDecl | PEDecl GEDecl '<!ENTITY' S Name S EntityDef S? '>' PEDecl '<!ENTITY' S '%' S Name S PEDef S? '>' EntityDef EntityValue | (ExternalID NDataDecl?) PEDef EntityValue | ExternalID The Name identifies the entity in an entity reference or, in the case of an unparsed entity, in the value of an ENTITY or ENTITIES attribute. If the same entity is declared more than once, the first declaration encountered is binding; at user option, an XML processor may issue a warning if entities are declared multiple times.

    Internal Entities

    If the entity definition is an EntityValue, the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration. Note that some processing of entity and character references in the literal entity value may be required to produce the correct replacement text: see .

    An internal entity is a parsed entity.

    Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

    External Entities

    If the entity is not internal, it is an external entity, declared as follows: External Entity Declaration ExternalID 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral NDataDecl S 'NDATA' S Name If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.

    Notation Declared

    The Name must match the declared name of a notation.

    The SystemLiteral is called the entity's system identifier. It is a URI, which may be used to retrieve the entity. Note that the hash mark (#) and fragment identifier frequently used with URIs are not, formally, part of the URI itself; an XML processor may signal an error if a fragment identifier is given as part of a system identifier. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element type defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URIs are relative to the location of the resource within which the entity declaration occurs. A URI might thus be relative to the document entity, to the entity containing the external DTD subset, or to some other external parameter entity.

    An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

    In addition to a system identifier, an external identifier may include a public identifier. An XML processor attempting to retrieve the entity's content may use the public identifier to try to generate an alternative URI. If the processor is unable to do so, it must use the URI specified in the system literal. Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.

    Examples of external entity declarations: <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

    Parsed Entities The Text Declaration

    External parsed entities may each begin with a text declaration. Text Declaration TextDecl &xmlpio; VersionInfo? EncodingDecl S? &pic;

    The text declaration must be provided literally, not by reference to a parsed entity. No text declaration may appear at any position other than the beginning of an external parsed entity.

    Well-Formed Parsed Entities

    The document entity is well-formed if it matches the production labeled document. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt. An external parameter entity is well-formed if it matches the production labeled extPE. Well-Formed External Parsed Entity extParsedEnt TextDecl? content extPE TextDecl? extSubsetDecl An internal general parsed entity is well-formed if its replacement text matches the production labeled content. All internal parameter entities are well-formed by definition.

    A consequence of well-formedness in entities is that the logical and physical structures in an XML document are properly nested; no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

    Character Encoding in Entities

    Each external parsed entity in an XML document may use a different encoding for its characters. All XML processors must be able to read entities in either UTF-8 or UTF-16.

    Entities encoded in UTF-16 must begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.

    Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration containing an encoding declaration: Encoding Declaration EncodingDecl S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) EncName [A-Za-z] ([A-Za-z0-9._] | '-')* Encoding name contains only Latin characters In the document entity, the encoding declaration is part of the XML declaration. The EncName is the name of the encoding used.

    In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority , other than those just listed, should be referred to using their registered names. Note that these registered names are defined to be case-insensitive, so processors wishing to match against them should do so in a case-insensitive way.

    In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

    It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process.

    Examples of encoding declarations: <?xml encoding='UTF-8'?> <?xml encoding='EUC-JP'?>

    XML Processor Treatment of Entities and References

    The table below summarizes the contexts in which character references, entity references, and invocations of unparsed entities might appear and the required behavior of an XML processor in each case. The labels in the leftmost column describe the recognition context:

    as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content.

    as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue.

    as a Name, not a reference, appearing either as the value of an attribute which has been declared as type ENTITY, or as one of the space-separated tokens in the value of an attribute which has been declared as type ENTITIES.

    as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue.

    as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue or AttValue.

    Entity Type Character Parameter Internal General External Parsed General Unparsed Reference in Content Not recognized Included Included if validating Forbidden Included Reference in Attribute Value Not recognized Included in literal Forbidden Forbidden Included Occurs as Attribute Value Not recognized Forbidden Forbidden Notify Not recognized Reference in EntityValue Included in literal Bypassed Bypassed Forbidden Included Reference in DTD Included as PE Forbidden Forbidden Forbidden Forbidden Not Recognized

    Outside the DTD, the % character has no special significance; thus, what would be parameter entity references in the DTD are not recognized as markup in content. Similarly, the names of unparsed entities are not recognized except when they appear in the value of an appropriately declared attribute.

    Included

    An entity is included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, except that the replacement text of entities used to escape markup delimiters (the entities &magicents;) is always treated as data. (The string "AT&amp;T;" expands to "AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter.) A character reference is included when the indicated character is processed in place of the reference itself.

    Included If Validating

    When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor must include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognized, but did not read, the entity.

    This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

    Forbidden

    The following are forbidden, and constitute fatal errors:

    the appearance of a reference to an unparsed entity.

    the appearance of any character or general-entity reference in the DTD except within an EntityValue or AttValue.

    a reference to an external entity in an attribute value.

    Included in Literal

    When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text is processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. For example, this is well-formed: ]]> while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

    Notify

    When the name of an unparsed entity appears as a token in the value of an attribute of declared type ENTITY or ENTITIES, a validating processor must inform the application of the system and public (if any) identifiers for both the entity and its associated notation.

    Bypassed

    When a general entity reference appears in the EntityValue in an entity declaration, it is bypassed and left as is.

    Included as PE

    Just as with external parsed entities, parameter entities need only be included if validating. When a parameter-entity reference is recognized in the DTD and included, its replacement text is enlarged by the attachment of one leading and one following space (#x20) character; the intent is to constrain the replacement text of parameter entities to contain an integral number of grammatical tokens in the DTD.

    Construction of Internal Entity Replacement Text

    In discussing the treatment of internal entities, it is useful to distinguish two forms of the entity's value. The literal entity value is the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. The replacement text is the content of the entity, after replacement of character references and parameter-entity references.

    The literal entity value as given in an internal entity declaration (EntityValue) may contain character, parameter-entity, and general-entity references. Such references must be contained entirely within the literal entity value. The actual replacement text that is included as described above must contain the replacement text of any parameter entities referred to, and must contain the character referred to, in place of any character references in the literal entity value; however, general-entity references must be left as-is, unexpanded. For example, given the following declarations: ]]> then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; The general-entity reference "&rights;" would be expanded should the reference "&book;" appear in the document's content or an attribute value.

    These simple rules may have complex interactions; for a detailed discussion of a difficult example, see .

    Predefined Entities

    Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (&magicents;) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be used to escape < and & when they occur in character data.

    All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them. If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character, as shown below. ]]> Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

    Notation Declarations

    Notations identify by name the format of unparsed entities, the format of elements which bear a notation attribute, or the application to which a processing instruction is addressed.

    Notation declarations provide a name for the notation, for use in entity and attribute-list declarations and in attribute specifications, and an external identifier for the notation which may allow an XML processor or its client application to locate a helper application capable of processing data in the given notation. Notation Declarations NotationDecl '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' PublicID 'PUBLIC' S PubidLiteral

    XML processors must provide applications with the name and external identifier(s) of any notation declared and referred to in an attribute value, attribute definition, or entity declaration. They may additionally resolve the external identifier into the system identifier, file name, or other information needed to allow the application to call a processor for data in the notation described. (It is not an error, however, for XML documents to declare and refer to notations for which notation-specific applications are not available on the system where the XML processor or application is running.)

    Document Entity

    The document entity serves as the root of the entity tree and a starting-point for an XML processor. This specification does not specify how the document entity is to be located by an XML processor; unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

    Conformance Validating and Non-Validating Processors

    Conforming XML processors fall into two classes: validating and non-validating.

    Validating and non-validating processors alike must report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.

    Validating processors must report violations of the constraints expressed by the declarations in the DTD, and failures to fulfill the validity constraints given in this specification. To accomplish this, validating XML processors must read and process the entire DTD and all external parsed entities referenced in the document.

    Non-validating processors are required to check only the document entity, including the entire internal DTD subset, for well-formedness. While they are not required to check the document for validity, they are required to process all the declarations they read in the internal DTD subset and in any parameter entity that they read, up to the first reference to a parameter entity that they do not read; that is to say, they must use the information in those declarations to normalize attribute values, include the replacement text of internal entities, and supply default attribute values. They must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations.

    Using XML Processors

    The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations. Less is required of a non-validating processor; it need not read any part of the document other than the document entity. This has two effects that may be important to users of XML processors:

    Certain well-formedness errors, specifically those that require reading external entities, may not be detected by a non-validating processor. Examples include the constraints entitled Entity Declared, Parsed Entity, and No Recursion, as well as some of the cases described as forbidden in .

    The information passed from the processor to the application may vary, depending on whether the processor reads parameter and external entities. For example, a non-validating processor may not normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities.

    For maximum reliability in interoperating between different XML processors, applications which use non-validating processors should not rely on any behaviors not required of such processors. Applications which require facilities such as the use of default attributes or internal entities which are declared in external entities should use validating XML processors.

    Notation

    The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression

    Symbols are written with an initial capital letter if they are defined by a regular expression, or with an initial lower case letter otherwise. Literal strings are quoted.

    Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

    where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

    matches any character with a value in the range(s) indicated (inclusive).

    matches any character with a value outside the range indicated.

    matches any character with a value not among the characters given.

    matches a literal string matching that given inside the double quotes.

    matches a literal string matching that given inside the single quotes.

    These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

    expression is treated as a unit and may be combined as described in this list.

    matches A or nothing; optional A.

    matches A followed by B.

    matches A or B but not both.

    matches any string that matches A but does not match B.

    matches one or more occurrences of A.

    matches zero or more occurrences of A.

    Other notations used in the productions are:

    comment.

    well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.

    validity constraint; this identifies by name a constraint on valid documents associated with a production.

    References Normative References (Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. IETF (Internet Engineering Task Force). RFC 1766: Tags for the Identification of Languages, ed. H. Alvestrand. 1995. (International Organization for Standardization). ISO 639:1988 (E). Code for the representation of names of languages. [Geneva]: International Organization for Standardization, 1988. (International Organization for Standardization). ISO 3166-1:1997 (E). Codes for the representation of names of countries and their subdivisions — Part 1: Country codes [Geneva]: International Organization for Standardization, 1997. ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7). The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996. Other References Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Reading: Addison-Wesley, 1986, rpt. corr. 1988. Berners-Lee, T., R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax and Semantics. 1997. (Work in progress; see updates to RFC1738.) Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Extended abstract in I. Simon, Hrsg., LATIN 1992, S. 97-98. Springer-Verlag, Berlin 1992. Full Version in Theoretical Computer Science 120: 197-213, 1993. Brüggemann-Klein, Anne, and Derick Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, Bericht 38, Oktober 1991. James Clark. Comparison of SGML and XML. See http://www.w3.org/TR/NOTE-sgml-xml-971215. IETF (Internet Engineering Task Force). RFC 1738: Uniform Resource Locators (URL), ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. IETF (Internet Engineering Task Force). RFC 1808: Relative Uniform Resource Locators, ed. R. Fielding. 1995. IETF (Internet Engineering Task Force). RFC 2141: URN Syntax, ed. R. Moats. 1997. ISO (International Organization for Standardization). ISO 8879:1986(E). Information processing — Text and Office Systems — Standard Generalized Markup Language (SGML). First edition — 1986-10-15. [Geneva]: International Organization for Standardization, 1986. ISO (International Organization for Standardization). ISO/IEC 10744-1992 (E). Information technology — Hypermedia/Time-based Structuring Language (HyTime). [Geneva]: International Organization for Standardization, 1992. Extended Facilities Annexe. [Geneva]: International Organization for Standardization, 1996. Character Classes

    Following the characteristics defined in the Unicode standard, characters are classed as base characters (among others, these contain the alphabetic characters of the Latin alphabet, without diacritics), ideographic characters, and combining characters (among others, this class contains most diacritics); these classes combine to form the class of letters. Digits and extenders are also distinguished. Characters Letter BaseChar | Ideographic BaseChar [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3] Ideographic [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] CombiningChar [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A Digit [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] Extender #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

    The character classes defined here can be derived from the Unicode character database as follows:

    Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

    Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

    Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names.

    Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.

    The following characters are treated as name-start characters rather than name characters, because the property file classifies them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.

    Characters #x20DD-#x20E0 are excluded (in accordance with Unicode, section 5.14).

    Character #x00B7 is classified as an extender, because the property list so identifies it.

    Character #x0387 is added as a name character, because #x00B7 is its canonical equivalent.

    Characters ':' and '_' are allowed as name-start characters.

    Characters '-' and '.' are allowed as name characters.

    XML and SGML

    XML is designed to be a subset of SGML, in that every valid XML document should also be a conformant SGML document. For a detailed comparison of the additional restrictions that XML places on documents beyond those of SGML, see .

    Expansion of Entity and Character References

    This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in .

    If the DTD contains the declaration An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).

    " > ]]> then the XML processor will recognize the character references when it parses the entity declaration, and resolve them before storing the following string as the value of the entity "example": An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

    ]]>
    A reference in the document to "&example;" will cause the text to be reparsed, at which time the start- and end-tags of the "p" element will be recognized and the three references will be recognized and expanded, resulting in a "p" element with the following content (all data, no delimiters or markup):

    A more complex example will illustrate the rules and their effects fully. In the following example, the line numbers are solely for reference. 2 4 5 ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. ]]> This produces the following:

    in line 4, the reference to character 37 is expanded immediately, and the parameter entity "xx" is stored in the symbol table with the value "%zz;". Since the replacement text is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since "zz" is not yet declared.)

    in line 5, the character reference "&#60;" is expanded immediately and the parameter entity "zz" is stored with the replacement text "<!ENTITY tricky "error-prone" >", which is a well-formed entity declaration.

    in line 6, the reference to "xx" is recognized, and the replacement text of "xx" (namely "%zz;") is parsed. The reference to "zz" is recognized in its turn, and its replacement text ("<!ENTITY tricky "error-prone" >") is parsed. The general entity "tricky" has now been declared, with the replacement text "error-prone".

    in line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the "test" element is the self-describing (and ungrammatical) string This sample shows a error-prone method.

    Deterministic Content Models

    For compatibility, it is required that content models in element type declarations be deterministic.

    SGML requires deterministic content models (it calls them "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

    For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser cannot know which b in the model is being matched without looking ahead to see which element follows the b. In this case, the two references to b can be collapsed into a single reference, making the model read (b, (c | d)). An initial b now clearly matches only a single name in the content model. The parser doesn't need to look ahead to see what follows; either c or d would be accepted.

    More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman . In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error.

    Algorithms exist which allow many but not all non-deterministic content models to be reduced automatically to equivalent deterministic models; see Brüggemann-Klein 1991 .

    Autodetection of Character Encodings

    The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself. Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying (external) information. We consider the first case first.

    Because each XML entity not in UTF-8 or UTF-16 format must begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the Byte Order Mark required of UTF-16 data streams is "#xFEFF".

    00 00 00 3C: UCS-4, big-endian machine (1234 order)

    3C 00 00 00: UCS-4, little-endian machine (4321 order)

    00 00 3C 00: UCS-4, unusual octet order (2143)

    00 3C 00 00: UCS-4, unusual octet order (3412)

    FE FF: UTF-16, big-endian

    FF FE: UTF-16, little-endian

    00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 00 3F 00: UTF-16, little-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the ASCII characters, the encoding declaration itself may be read reliably

    4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use)

    other: UTF-8 without an encoding declaration, or else the data stream is corrupt, fragmentary, or enclosed in a wrapper of some kind

    This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 from each other, or to distinguish the specific EBCDIC code page in use, and so on).

    Because the contents of the encoding declaration are restricted to ASCII characters, a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable.

    Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input.

    Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration. Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity.

    The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended.

    If an XML entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules.

    W3C XML Working Group

    This specification was prepared and approved for publication by the W3C XML Working Group (WG). WG approval of this specification does not necessarily imply that all WG members voted for its approval. The current and former members of the XML WG are:

    Jon Bosak, SunChair James ClarkTechnical Lead Tim Bray, Textuality and NetscapeXML Co-editor Jean Paoli, MicrosoftXML Co-editor C. M. Sperberg-McQueen, U. of Ill.XML Co-editor Dan Connolly, W3CW3C Liaison Paula Angerstein, Texcel Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, ISOGEN Eve Maler, ArborText Tom Magliery, NCSA Murray Maloney, Muzmo and Grif Makoto Murata, Fuji Xerox Information Systems Joel Nava, Adobe Conleth O'Connell, Vignette Peter Sharpe, SoftQuad John Tigue, DataChannel
    XML-XSLT-0.48/examples/agenda.html0100644000076500007650000032175207115344401017032 0ustar jonathanjonathanContent-type: text/html Sigma Agenda

    Januari

    4/1/1999 Borrel Nieuwjaarsborrel
    16.30
    kantine B-faculteit
    Subfaculteit Scheikunde
    13/1/1999 E-lezing Informed Chemistry: what can it do for synthesis?
    Informed Chemistry: what can it do for synthesis?
    16.00
    Internet
    Chemweb.Com
    13/1/1999 Borrel Nieuwjaarsborrel
    16.00
    Ul-kantine
    Sigma
    25/1/1999 Lezing The development of oxidation state +IV for palladium in it's organometalic chemistry
    11.30-12.30
    A0004
    Spreker: Prof. dr. A.J. Canty, University of Tasmania
    Anorg. Chem. (NSR Center)
    26/1/1999 Symposium Virtuele Universiteit of Universele Virtualiteit
    13:30
    Raadhuis Nijmegen
    Gemeente Nijmegen
    28/1/1999 Promotie Exploring tertiary folding in RNA
    13:30 precies
    KUN Aula
    Door M.H.Kolk
    KUN
    29/1/1999 Sport Schaatsen
    Schaatsen
    va 14.00
    Triavium
    Sigma

    Februari

    1/2/1999 Begin Voorjaarssemester





    KUN
    2/2/1999 Lezing "Nieuwe materialen op basis van organische synthese"
    14.00
    CZ I
    Spreker: dr. Frank van Veggel, Laboratorium voor organische chemie, Universiteit Twente
    NSR
    2/2/1999 Lezing "Transition metal catalysed carbon-carbon bond formation"
    16.00
    CZ I
    Spreker: dr Paul CJ Kamer, Institute of Molecular Chemistry, University of Amsterdam
    NSR
    3/2/1999 Lezing "Novel approaches for the synthesis of druglike building blocks"
    10.00
    CZ I
    Spreker: dr Floris PJT Rutjes, Institute of Molecular Chemistry, University of Amsterdam
    NSR
    4/2/1999 Borrel BBB-bestuurswisselborrel
    BBB-bestuurswisselborrel
    16.30u
    Collegezalenrondgang
    BBB
    4/2/1999 Film Alien IV
    Alien IV
    19.30u
    CZ N2
    Kosten: f1,50
    St. Beet
    5/2/1999 Deadline G-mi





    9/2/1999 Cursus Sollicitatietraining
    Sollicitatietraining
    10.30u-12.30u
    CZ N3
    BBB
    9/2/1999 Sigma avond
    21.00u
    Cafe de Fiets
    Sigma
    10/2/1999 Symposium Natuurkunde in een Notendop
    Natuurkunde in een Notendop
    9.30u-16.30u
    Marie Curie
    10/2/1999 Sport Tafelvoetbaltoernooi
    15.45u
    Inschrijven bij Thalia
    Thalia
    11/2/1999 Borrel W en N Carnavals-Beestborrel
    Marie-Curie kantine
    Marie Curie
    11/2/1999 Lezing The Structure and Fluxional Behaviour of the Binary Carbonyls
    The Structure and Fluxional Behaviour of the Binary Carbonyls
    17.00
    Internet
    Chemweb.Com
    11/2/1999 Feest BeestFeest
    Doornroosje
    St. Beet
    15:19/2/1999 Vrij Carnavalsvakantie





    23/2/1999 Sigma avond
    21.00u
    Cafe de Fiets
    Sigma
    24/2/1999 Lezing Schilderijrestauraties
    Schilderijrestauraties
    16.00
    CZ N1
    Sigma
    26/2/1999 Symposium ICT aan de KUN en daarbuiten
    ICT aan de KUN en daarbuiten
    10.00
    Collegezalencomplex, Mercatorpad 1
    IOWO

    Maart

    9/3/1999 Sport Spelletjesavond
    Spelletjesavond
    19.30
    Cafe de Fiets
    Sigma
    9/3/1999 Lezing Glas: helder en fascinerend, materie en fenomeen
    20:00
    Collegezalencomplex
    Sprekers: prof.dr. Carel L. Davidson en Louis Goosen
    Studium Generale
    10/3/1999 Feest Feest met A-faculteiten: Fantasy-Feest
    22.00
    Diogenes
    entree:
    Sigma, Thalia, Desda, Postelein, Svn en Sophia
    11/3/1999 Excursie Glas: Bezoek aan de glasinstrumentenmakerij van de KUN
    17:00
    Toernooiveld 1
    Inschrijven verplicht.
    Studium Generale
    11/3/1999 Cursus Wijnproeven
    17:00
    Marjolijn of Irene
    5,00
    Slechts een beperkt aantal mensen kan meedoen
    11/3/1999 Film South Park
    South Park
    19.30u
    CZ N2
    1,50
    St. Beet
    12/3/1999 Promotie 'Selection in neural information processing'
    13:30 Precies
    KUN Aula
    P.J.L.J. van de Laar
    17/3/1999 Symposium Sigma Symposium: Economie in de Erlenmyer
    Sigma Symposium: Economie in de Erlenmyer
    CZ N2
    Sigma
    17/3/1999 Lezing Metal Ion Cage Complexes: Synthesis, Reactivity and Uses
    Metal Ion Cage Complexes: Synthesis, Reactivity and Uses
    14:30
    Internet
    Chemweb
    17/3/1999 Lezing Victor Westhoff-lezing: "Over de toestand van van natuur en milieu in Nederland en daarbuiten"
    14:30
    KUN Aula
    KUN
    23/3/1999 Cursus Assessmenttraining
    Assessmenttraining
    10.30u-12.30u
    CZ N4
    BBB
    24/3/1999 Lezing De Engelse Industri
    14.00u
    CZ N6
    SCN99
    24/3/1999 Lezing Beheersing van afvalwaterlozing van de chemische industrie in UK
    14.00u
    CZ N6
    SCN99
    24/3/1999 Cursus Keltische Week: Whiskey Proeven
    16.00u
    Cultuur Cafe
    tel. 3615908
    5,-
    SPC
    31/3/1999 Vergadering Ledenvergadering
    20.00u
    Bovenzaal cafe de Fiets
    Sigma

    April

    1/4/1999 Excursie Grolsch
    Grolsch
    13.30
    Enschede
    Sigma
    1/4/1999 Lezing Studiereis lezing
    16.00u
    CZ N6
    SCN99
    1/4/1999 Deadline G-mi





    2/4/1999 Vrij Goede Vrijdag





    5/4/1999 Vrij 2e Paasdag





    5/4/1999 Sport Paaslympics
    Paaslympics





    St. Beet en BeeVee
    6/4/1999 Borrel W en N Paas-Beestborrel





    BBB en Leonardo
    6/4/1999 Film X-Files: Fight the Future
    X-Files: Fight the Future
    19.30u
    CZ N2
    1,50
    St. Beet
    7/4/1999 Cursus Programmeren in C
    Programmeren in C





    Sigma
    7/4/1999 Vergadering Subfaculteits bestuursvergadering
    16.15
    CZ III
    Na afloop borrel
    7/4/1999 Lezing Studiereis lezing
    13.30u
    CZ N1
    SCN99
    8/4/1999 Feest BeestFeest
    BeestFeest





    St. Beet
    9:11/4/1999 Kamp Weekendkamp





    Sigma
    13/4/1999 Sport Darten met Thalia
    Darten met Thalia
    All-In
    Thalia en Sigma
    13/4/1999 Promotie In vivo 13C MR spectroscopy for human investigations
    15:30 precies
    KUN-Aula/Congresgebouw
    Door: A.J.van den Bergh
    KUN
    16/4/1999 Uitreiking Van Melsenprijs





    Deze prijs wordt jaarlijks uitgereikt aan leerlingen uit 5-havo en 6-vwo, voor experimenten uitgevoerd voor het schoolonderzoek.
    B-Faculteit
    16/4/1999 Symposium Biologie symposium






    17/4/1999 Ouderdag Ouderdag





    Sigma
    21/4/1999 - 4/5/199 Conferentie Molecular Simulation '99
    Molecular Simulation '99
    Internet
    GRATIS
    Internetconferentie met zeer veel (gratis) lezingen.
    VEI
    22/4/1999 Film Blade
    Blade
    19.30u
    CZ N2
    1,50
    In de pauze is er koffie en thee. Neem zelf een kopje mee!
    St. Beet
    22/4/1999 Feest Pre-Batavierenrace feest
    22.30u
    Kolpinghuis
    3,50
    met
    24+25/4/1999 Sport Batavierenrace
    Batavierenrace





    Schrijf je in bij een van de leden van de sportcie
    St. Batavierenrace
    27/4/1999 Snuffelweek op de B-faculteit
    11.00-16.30
    Grasveld B-faculteit
    Borrel om 15.00
    BeeVee, Desda, Leonardo, Marie-Curie, Sigma, Thalia
    28/4/1999 Show Karaokeshow
    16.00
    UL-kantine
    Sigma
    29/4/1999 Excursie Cap Gemini
    Cap Gemini
    12.00-16.30
    Sigma
    30/4/1999 Vrij Koninginnedag






    Mei

    5/5/1999 Bussines Course Uiterste inschrijfdatum Unilever B.C.
    Uiterste inschrijfdatum Unilever B.C.





    De Bussiness Course is van 6-9 juli.
    2:16/5/1999 Reis Studiereis naar Engeland en Schotland





    3:7/5/1999 Vrij Meivakantie





    11/5/1999 Lezing Chemische wapens
    Chemische wapens
    16.00
    CZN1
    Patricia Dankers
    Sigma
    11/5/1999 Film The Truman show
    The Truman show
    19.30u
    CZ N2
    1,50
    In de pauze is er koffie en thee. Neem zelf een kopje mee!
    St. Beet
    12/5/1999 Excursie Bijbels Openluchtmuseum
    Bijbels Openluchtmuseum
    's-middags
    Heilige Landstichting
    Irene Reynhout
    Sigma
    13+14/5/1999 Vrij Hemelvaart





    13+14/5/1999 Sport Open Nederlandse Chemie Sportdagen
    Open Nederlandse Chemie Sportdagen





    18/5/1999 Promotie NMR studies of fusarium solani pisi cutinase. Structure-mobility-function relationships
    13:30 precies
    KUN Aula
    Door J.J. Prompers
    KUN
    19/5/1999 Show Soundmixshow
    Cultureel Cafe
    19:21/5/1999 Colloquium The challenge of pragmatic process philosophy
    CZ II
    20/5/1999 Beurs BetaBedrijvenBeurs
    BetaBedrijvenBeurs
    Rondgang
    BBB
    20/5/1999 Feest Feest 2e jaars jaarraad
    21.00
    De Fiets
    Jaarraad '97
    21/5/1999 Info Voorlichtingsbijeenkomst studie Natuurwetenschappen
    12.45-13.45
    A3012
    Voor meer informatie neem contact op met:
    24/5/1999 Vrij Pinksteren





    25:29/5/1999 Het kriebelt 99
    Het kriebelt 99





    Voor meer info kijk in het programmaboekje, volg de link of bel Diogenes (3604842) of SPC (3612823)
    Diogenes &amp SPC
    26/5/1999 Borrel W &amp; N Eindejaars-Beestborrel
    Thalia kantine
    Thalia
    26/5/1999 Film Freddie, de koele kikker
    Freddie, de koele kikker
    19.30u
    CZ N2
    1,50
    In de pauze is er koffie en thee. Neem zelf een kopje mee!
    St. Beet
    26/5/1999 Info Voorlichtingsbijeenkomst predoctorale lerarenopleiding
    16.00-17.00
    CZN6
    De bijeenkomst zal gaan over inhoud en werkwijze v/d predoc. lerarenopl.
    Unilo
    27/5/1999 Feest Beestfeest
    Beestfeest
    21.30-4.00
    Doornroosje
    Entree gratis
    St. Beet

    Juni

    3/6/1999 Info KNCV-bedrijfseconomiedag
    Nijmegen
    KNCV
    4/6/1999 Deadline G-mi





    7/6/1999 Borrel Eindejaars borrel
    16.00
    UL-kantine
    Met speciale Sigma-onthulling
    Sigma
    9/6/1999 Feest Feest 1e jaars jaarraad
    21.30
    cafe de Fiets
    Entree gratis
    Jaarraad '98
    14:18/6/1999 College Industri&euml;le Chemie
    10.00-15.00
    CZ I
    Meer informatie:
    22/6/1999 Barbecue
    Wylerbergmeer
    15,=
    Sigma
    25/6/1999 Lezing Dendritic Molecules, Concepts, Synthesis and Perspectives
    14.00-15.00
    CZ III
    Spreker:
    Nijmegen SON Research Institute

    Juli

    2/7/1999 Oratie Een magnetische blik op het leven
    15.00
    KUN-Aula
    Spreker:
    KUN
    6:9/7/1999 Bussines Course Unilever Bussines Course
    Unilever Bussines Course





    Inschrijving voor 5 mei.
    16/7/1999-31/8/1999 Vrij Zomervakantie





    Voor diegene die vakantie hebben:

    Augustus

    31/8/1999 Wedstrijd Essaywedstrijd voor studenten





    Studenten kunnen meedoen aan een essaywedstrijd. Hiervoor dient een wetenschappelijk es-say te worden geschreven over het thema 'Met de ziel onder de arm. Over de lichamelijkheid van de geest'. Inleveren tot 1 september.
    KUN
    XML-XSLT-0.48/examples/grammar.xml0100644000076500007650000046717507120110022017071 0ustar jonathanjonathan "> '"> amp, lt, gt, apos, quot"> ]>
    Extensible Markup Language (XML) 1.0 REC-xml-&iso6.doc.date; W3C Recommendation &draft.day;&draft.month;&draft.year; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date; http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps http://www.w3.org/TR/REC-xml http://www.w3.org/TR/PR-xml-971208 Tim Bray Textuality and Netscape tbray@textuality.com Jean Paoli Microsoft jeanpa@microsoft.com C. M. Sperberg-McQueen University of Illinois at Chicago cmsmcq@uic.edu

    The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

    This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

    This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. It is a product of the W3C XML Activity, details of which can be found at http://www.w3.org/XML. A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

    This specification uses the term URI, which is defined by , a work in progress expected to update and .

    The list of known errors in this specification is available at http://www.w3.org/XML/xml-19980210-errata.

    Please report errors in this document to xml-editor@w3.org.

    Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium, XML Working Group, 1996, 1997.

    Created in electronic form.

    English Extended Backus-Naur Form (formal grammar) 1997-12-03 : CMSMcQ : yet further changes 1997-12-02 : TB : further changes (see TB to XML WG, 2 December 1997) 1997-12-02 : CMSMcQ : deal with as many corrections and comments from the proofreaders as possible: entify hard-coded document date in pubdate element, change expansion of entity WebSGML, update status description as per Dan Connolly (am not sure about refernece to Berners-Lee et al.), add 'The' to abstract as per WG decision, move Relationship to Existing Standards to back matter and combine with References, re-order back matter so normative appendices come first, re-tag back matter so informative appendices are tagged informdiv1, remove XXX XXX from list of 'normative' specs in prose, move some references from Other References to Normative References, add RFC 1738, 1808, and 2141 to Other References (they are not normative since we do not require the processor to enforce any rules based on them), add reference to 'Fielding draft' (Berners-Lee et al.), move notation section to end of body, drop URIchar non-terminal and use SkipLit instead, lose stray reference to defunct nonterminal 'markupdecls', move reference to Aho et al. into appendix (Tim's right), add prose note saying that hash marks and fragment identifiers are NOT part of the URI formally speaking, and are NOT legal in system identifiers (processor 'may' signal an error). Work through: Tim Bray reacting to James Clark, Tim Bray on his own, Eve Maler, NOT DONE YET: change binary / text to unparsed / parsed. handle James's suggestion about < in attriubte values uppercase hex characters, namechar list, 1997-12-01 : JB : add some column-width parameters 1997-12-01 : CMSMcQ : begin round of changes to incorporate recent WG decisions and other corrections: binding sources of character encoding info (27 Aug / 3 Sept), correct wording of Faust quotation (restore dropped line), drop SDD from EncodingDecl, change text at version number 1.0, drop misleading (wrong!) sentence about ignorables and extenders, modify definition of PCData to make bar on msc grammatical, change grammar's handling of internal subset (drop non-terminal markupdecls), change definition of includeSect to allow conditional sections, add integral-declaration constraint on internal subset, drop misleading / dangerous sentence about relationship of entities with system storage objects, change table body tag to htbody as per EM change to DTD, add rule about space normalization in public identifiers, add description of how to generate our name-space rules from Unicode character database (needs further work!). 1997-10-08 : TB : Removed %-constructs again, new rules for PE appearance. 1997-10-01 : TB : Case-sensitive markup; cleaned up element-type defs, lotsa little edits for style 1997-09-25 : TB : Change to elm's new DTD, with substantial detail cleanup as a side-effect 1997-07-24 : CMSMcQ : correct error (lost *) in definition of ignoreSectContents (thanks to Makoto Murata) Allow all empty elements to have end-tags, consistent with SGML TC (as per JJC). 1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections: introduce the term 'empty-element tag', note that all empty elements may use it, and elements declared EMPTY must use it. Add WFC requiring encoding decl to come first in an entity. Redefine notations to point to PIs as well as binary entities. Change autodetection table by removing bytes 3 and 4 from examples with Byte Order Mark. Add content model as a term and clarify that it applies to both mixed and element content. 1997-06-30 : CMSMcQ : change date, some cosmetic changes, changes to productions for choice, seq, Mixed, NotationType, Enumeration. Follow James Clark's suggestion and prohibit conditional sections in internal subset. TO DO: simplify production for ignored sections as a result, since we don't need to worry about parsers which don't expand PErefs finding a conditional section. 1997-06-29 : TB : various edits 1997-06-29 : CMSMcQ : further changes: Suppress old FINAL EDIT comments and some dead material. Revise occurrences of % in grammar to exploit Henry Thompson's pun, especially markupdecl and attdef. Remove RMD requirement relating to element content (?). 1997-06-28 : CMSMcQ : Various changes for 1 July draft: Add text for draconian error handling (introduce the term Fatal Error). RE deleta est (changing wording from original announcement to restrict the requirement to validating parsers). Tag definition of validating processor and link to it. Add colon as name character. Change def of %operator. Change standard definitions of lt, gt, amp. Strip leading zeros from #x00nn forms. 1997-04-02 : CMSMcQ : final corrections of editorial errors found in last night's proofreading. Reverse course once more on well-formed: Webster's Second hyphenates it, and that's enough for me. 1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self 1997-03-31 : Tim Bray : many changes 1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling), some Charles Goldfarb, some ERB decisions (PE handling in miscellaneous declarations. Changed Ident element to accept def attribute. Allow normalization of Unicode characters. move def of systemliteral into section on literals. 1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, Paul Grosso, and self. Among other things: give in on "well formed" (Terry is right), tentatively rename QuotedCData as AttValue and Literal as EntityValue to be more informative, since attribute values are the only place QuotedCData was used, and vice versa for entity text and Literal. (I'd call it Entity Text, but 8879 uses that name for both internal and external entities.) 1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not' except in the one case where it meant 'may or may not'. 1997-03-21 : TB : massive changes on plane flight from Chicago to Vancouver 1997-03-21 : CMSMcQ : correct as many reported errors as possible. 1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec. 1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for WWW conference April 1997: restore some of the internal entity references (e.g. to docdate, etc.), change character xA0 to &nbsp; and define nbsp as &#160;, and refill a lot of paragraphs for legibility. 1996-11-12 : CMSMcQ : revise using Tim's edits: Add list type of NUMBERED and change most lists either to BULLETS or to NUMBERED. Suppress QuotedNames, Names (not used). Correct trivial-grammar doc type decl. Rename 'marked section' as 'CDATA section' passim. Also edits from James Clark: Define the set of characters from which [^abc] subtracts. Charref should use just [0-9] not Digit. Location info needs cleaner treatment: remove? (ERB question). One example of a PI has wrong pic. Clarify discussion of encoding names. Encoding failure should lead to unspecified results; don't prescribe error recovery. Don't require exposure of entity boundaries. Ignore white space in element content. Reserve entity names of the form u-NNNN. Clarify relative URLs. And some of my own: Correct productions for content model: model cannot consist of a name, so "elements ::= cp" is no good. 1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, for parameter entities. 1996-11-10 : CMSMcQ : revise for style. Fix / complete section on names, characters. Add sections on parameter entities, conditional sections. Still to do: Add compatibility note on deterministic content models. Finish stylistic revision. 1996-10-31 : TB : Add Entity Handling section 1996-10-30 : TB : Clean up term & termdef. Slip in ERB decision re EMPTY. 1996-10-28 : TB : Change DTD. Implement some of Michael's suggestions. Change comments back to //. Introduce language for XML namespace reservation. Add section on white-space handling. Lots more cleanup. 1996-10-24 : CMSMcQ : quick tweaks, implement some ERB decisions. Characters are not integers. Comments are /* */ not //. Add bibliographic refs to 10646, HyTime, Unicode. Rename old Cdata as MsData since it's only seen in marked sections. Call them attribute-value pairs not name-value pairs, except once. Internal subset is optional, needs '?'. Implied attributes should be signaled to the app, not have values supplied by processor. 1996-10-16 : TB : track down & excise all DSD references; introduce some EBNF for entity declarations. 1996-10-?? : TB : consistency check, fix up scraps so they all parse, get formatter working, correct a few productions. 1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and organizational changes: Replace a few literals with xmlpio and pic entities, to make them consistent and ensure we can change pic reliably when the ERB votes. Drop paragraph on recognizers from notation section. Add match, exact match to terminology. Move old 2.2 XML Processors and Apps into intro. Mention comments, PIs, and marked sections in discussion of delimiter escaping. Streamline discussion of doctype decl syntax. Drop old section of 'PI syntax' for doctype decl, and add section on partial-DTD summary PIs to end of Logical Structures section. Revise DSD syntax section to use Tim's subset-in-a-PI mechanism. 1996-10-10 : TB : eliminate name recognizers (and more?) 1996-10-09 : CMSMcQ : revise for style, consistency through 2.3 (Characters) 1996-10-09 : CMSMcQ : re-unite everything for convenience, at least temporarily, and revise quickly 1996-10-08 : TB : first major homogenization pass 1996-10-08 : TB : turn "current" attribute on div type into CDATA 1996-10-02 : TB : remould into skeleton + entities 1996-09-30 : CMSMcQ : add a few more sections prior to exchange with Tim. 1996-09-20 : CMSMcQ : finish transcribing notes. 1996-09-19 : CMSMcQ : begin transcribing notes for draft. 1996-09-13 : CMSMcQ : made outline from notes of 09-06, do some housekeeping
    Introduction

    Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language . By construction, XML documents are conforming SGML documents.

    XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

    A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

    Origin and Goals

    XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.

    The design goals for XML are:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness in XML markup is of minimal importance.

    This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version &XML.version; and construct computer programs to process it.

    This version of the XML specification &doc.distribution;.

    Terminology

    The terminology used to describe XML documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of an XML processor:

    Conforming documents and XML processors are permitted to but need not behave as described.

    Conforming documents and XML processors are required to behave as described; otherwise they are in error.

    A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

    An error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application. In order to support correction of errors, the processor may make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

    Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

    A rule which applies to all valid XML documents. Violations of validity constraints are errors; they must, at user option, be reported by validating XML processors.

    A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are fatal errors.

    (Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. At user option, processors may normalize such characters to some canonical form. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production. (Of content and content models:) An element matches its declaration when it conforms in the fashion described in the constraint .

    A feature of XML included solely to ensure that XML remains compatible with SGML.

    A non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the &WebSGML;.

    Documents

    A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document may in addition be valid if it meets certain further constraints.

    Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly, as described in .

    Well-Formed XML Documents

    A textual object is a well-formed XML document if:

    Taken as a whole, it matches the production labeled document.

    It meets all the well-formedness constraints given in this specification.

    Each of the parsed entities which is referenced directly or indirectly within the document is well-formed.

    Document document prolog element Misc*

    Matching the document production implies that:

    It contains one or more elements.

    There is exactly one element, called the root, or document element, no part of which appears in the content of any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other.

    As a consequence of this, for each non-root element C in the document, there is one other element P in the document such that C is in the content of P, but is not in the content of any other element that is in the content of P. P is referred to as the parent of C, and C as a child of P.

    Characters

    A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified by ISO/IEC 10646 . Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. The use of "compatibility characters", as defined in section 6.8 of , is discouraged. Character Range Char #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.

    The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in .

    Common Syntactic Constructs

    This section defines some symbols used widely in the grammar.

    S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space S (#x20 | #x9 | #xD | #xA)+

    Characters are classified for convenience as letters, digits, or other characters. Letters consist of an alphabetic or syllabic base character possibly followed by one or more combining characters, or of an ideographic character. Full definitions of the specific characters in each class are given in .

    A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. Names beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

    The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

    An Nmtoken (name token) is any mixture of name characters. Names and Tokens NameChar Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender Name (Letter | '_' | ':') (NameChar)* Names Name (S Name)* Nmtoken (NameChar)+ Nmtokens Nmtoken (S Nmtoken)*

    Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral). Note that a SystemLiteral can be parsed without scanning for markup. Literals EntityValue '"' ([^%&"] | PEReference | Reference)* '"' |  "'" ([^%&'] | PEReference | Reference)* "'" AttValue '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'" SystemLiteral ('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

    Character Data and Markup

    Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions.

    All text that is not markup constitutes the character data of the document.

    The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration; see . If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

    In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>".

    To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as "&quot;". Character Data CharData [^<&]* - ([^<&]* ']]>' [^<&]*)

    Comments

    Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments. Comments Comment '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

    An example of a comment: <!&como; declarations for <head> & <body> &comc;>

    Processing Instructions

    Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions PI '<?' PITarget (S (Char* - (Char* &pic; Char*)))? &pic; PITarget Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application to which the instruction is directed. The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification. The XML Notation mechanism may be used for formal declaration of PI targets.

    CDATA Sections

    CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>": CDATA Sections CDSect CDStart CData CDEnd CDStart '<![CDATA[' CData (Char* - (Char* ']]>' Char*)) CDEnd ']]>' Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

    An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]>

    Prolog and Document Type Declaration

    XML documents may, and should, begin with an XML declaration which specifies the version of XML being used. For example, the following is a complete XML document, well-formed but not valid: Hello, world! ]]> and so is this: Hello, world! ]]>

    The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error for a document to use the value "1.0" if it does not conform to this version of this specification. It is the intent of the XML working group to give later versions of this specification numbers other than "1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any are produced, to use any particular numbering scheme. Since future versions are not ruled out, this construct is provided as a means to allow the possibility of automatic version recognition, should it become necessary. Processors may signal an error if they receive documents labeled with versions they do not support.

    The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute-value pairs with its logical structures. XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.

    The document type declaration must appear before the first element in the document. Prolog prolog XMLDecl? Misc* (doctypedecl Misc*)? XMLDecl &xmlpio; VersionInfo EncodingDecl? SDDecl? S? &pic; VersionInfo S 'version' Eq (' VersionNum ' | " VersionNum ") Eq S? '=' S? VersionNum ([a-zA-Z0-9_.:] | '-')+ Misc Comment | PI | S

    The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together.

    A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration. These declarations may be contained in whole or in part within parameter entities, as described in the well-formedness and validity constraints below. For fuller information, see .

    Document Type Definition doctypedecl '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' markupdecl elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment

    The markup declarations may be made up in whole or in part of the replacement text of parameter entities. The productions later in this specification for individual nonterminals (elementdecl, AttlistDecl, and so on) describe the declarations after all the parameter entities have been included.

    Root Element Type

    The Name in the document type declaration must match the element type of the root element.

    Proper Declaration/PE Nesting

    Parameter-entity replacement text must be properly nested with markup declarations. That is to say, if either the first character or the last character of a markup declaration (markupdecl above) is contained in the replacement text for a parameter-entity reference, both must be contained in the same replacement text.

    PEs in Internal Subset

    In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.)

    Like the internal subset, the external subset and any external parameter entities referred to in the DTD must consist of a series of complete markup declarations of the types allowed by the non-terminal symbol markupdecl, interspersed with white space or parameter-entity references. However, portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset extSubset TextDecl? extSubsetDecl extSubsetDecl ( markupdecl | conditionalSect | PEReference | S )*

    The external subset and external parameter entities also differ from the internal subset in that in them, parameter-entity references are permitted within markup declarations, not only between markup declarations.

    An example of an XML document with a document type declaration: Hello, world! ]]> The system identifier "hello.dtd" gives the URI of a DTD for the document.

    The declarations can also be given locally, as in this example: ]> Hello, world! ]]> If both the external and internal subsets are used, the internal subset is considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

    Standalone Document Declaration

    Markup declarations can affect the content of the document, as passed from an XML processor to an application; examples are attribute defaults and entity declarations. The standalone document declaration, which may appear as a component of the XML declaration, signals whether or not there are such declarations which appear external to the document entity. Standalone Document Declaration SDDecl S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

    In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. The value "no" indicates that there are or may be such external markup declarations. Note that the standalone document declaration only denotes the presence of external declarations; the presence, in a document, of references to external entities, when those entities are internally declared, does not change its standalone status.

    If there are no external markup declarations, the standalone document declaration has no meaning. If there are external markup declarations but there is no standalone document declaration, the value "no" is assumed.

    Any XML document for which standalone="no" holds can be converted algorithmically to a standalone document, which may be desirable for some network delivery applications.

    Standalone Document Declaration

    The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:

    attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or

    entities (other than &magicents;), if references to those entities appear in the document, or

    attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or

    element types with element content, if white space occurs directly within any instance of those types.

    An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

    White Space Handling

    In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

    An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

    A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". For example:]]>

    The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance of the xml:space attribute.

    The root element of any document is considered to have signaled no intentions as regards application space handling, unless it provides a value for this attribute or the attribute is declared with a default value.

    End-of-Line Handling

    XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).

    To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can conveniently be produced by normalizing all line breaks to #xA on input, before parsing.)

    Language Identification

    In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by , "Tags for the Identification of Languages": Language Identification LanguageID Langcode ('-' Subcode)* Langcode ISO639Code | IanaCode | UserCode ISO639Code ([a-z] | [A-Z]) ([a-z] | [A-Z]) IanaCode ('i' | 'I') '-' ([a-z] | [A-Z])+ UserCode ('x' | 'X') '-' ([a-z] | [A-Z])+ Subcode ([a-z] | [A-Z])+ The Langcode may be any of the following:

    a two-letter language code as defined by , "Codes for the representation of names of languages"

    a language identifier registered with the Internet Assigned Numbers Authority ; these begin with the prefix "i-" (or "I-")

    a language identifier assigned by the user, or agreed on between parties in private use; these must begin with the prefix "x-" or "X-" in order to ensure that they do not conflict with names later standardized or registered with IANA

    There may be any number of Subcode segments; if the first subcode segment exists and the Subcode consists of two letters, then it must be a country code from , "Codes for the representation of names of countries." If the first subcode consists of more than two letters, it must be a subcode for the language in question registered with IANA, unless the Langcode begins with the prefix "x-" or "X-".

    It is customary to give the language code in lower case, and the country code (if any) in upper case. Note that these values, unlike other names in XML documents, are case insensitive.

    For example: The quick brown fox jumps over the lazy dog.

    What colour is it?

    What color is it?

    Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit heißem Bemüh'n. ]]>

    The intent declared with xml:lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml:lang on another element within that content.

    A simple declaration for xml:lang might take the form xml:lang NMTOKEN #IMPLIED but specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml:lang attribute might be declared this way: ]]>

    Logical Structures

    Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value.

    Element element EmptyElemTag | STag content ETag

    This specification does not constrain the semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.

    Element Type Match

    The Name in an element's end-tag must match the element type in the start-tag.

    Element Valid

    An element is valid if there is a declaration matching elementdecl where the Name matches the element type, and one of the following holds:

    The declaration matches EMPTY and the element has no content.

    The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between each pair of child elements.

    The declaration matches Mixed and the content consists of character data and child elements whose types match names in the content model.

    The declaration matches ANY, and the types of any child elements have been declared.

    Start-Tags, End-Tags, and Empty-Element Tags

    The beginning of every non-empty XML element is marked by a start-tag. Start-tag STag '<' Name (S Attribute)* S? '>' Attribute Name Eq AttValue The Name in the start- and end-tags gives the element's type. The Name-AttValue pairs are referred to as the attribute specifications of the element, with the Name in each pair referred to as the attribute name and the content of the AttValue (the text between the ' or " delimiters) as the attribute value.

    Unique Att Spec

    No attribute name may appear more than once in the same start-tag or empty-element tag.

    Attribute Value Type

    The attribute must have been declared; the value must be of the type declared for it. (For attribute types, see .)

    No External Entity References

    Attribute values cannot contain direct or indirect entity references to external entities.

    No < in Attribute Values

    The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

    An example of a start-tag: <termdef id="dt-dog" term="dog">

    The end of every element that begins with a start-tag must be marked by an end-tag containing a name that echoes the element's type as given in the start-tag: End-tag ETag '</' Name S? '>'

    An example of an end-tag:</termdef>

    The text between the start-tag and end-tag is called the element's content: Content of Elements content (element | CharData | Reference | CDSect | PI | Comment)*

    If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form: Tags for Empty Elements EmptyElemTag '<' Name (S Attribute)* S? '/>'

    Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY.

    Examples of empty elements: <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

    Element Type Declarations

    The element structure of an XML document may, for validation purposes, be constrained using element type and attribute-list declarations. An element type declaration constrains the element's content.

    Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.

    An element type declaration takes the form: Element Type Declaration elementdecl '<!ELEMENT' S Name S contentspec S? '>' contentspec 'EMPTY' | 'ANY' | Mixed | children where the Name gives the element type being declared.

    Unique Element Type Declaration

    No element type may be declared more than once.

    Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>

    Element Content

    An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles: Element-content Models children (choice | seq) ('?' | '*' | '+')? cp (Name | choice | seq) ('?' | '*' | '+')? choice '(' S? cp ( S? '|' S? cp )* S? ')' seq '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification.

    The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see .

    Proper Group/PE Nesting

    Parameter-entity replacement text must be properly nested with parenthetized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

    For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

    Examples of element-content models: <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

    Mixed Content

    An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements. In this case, the types of the child elements may be constrained, but not their order or their number of occurrences: Mixed-content Declaration Mixed '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' where the Names give the types of elements that may appear as children.

    No Duplicate Types

    The same name must not appear more than once in a single mixed-content declaration.

    Examples of mixed content declarations: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)>

    Attribute-List Declarations

    Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in . Attribute-list declarations may be used:

    To define the set of attributes pertaining to a given element type.

    To establish type constraints for these attributes.

    To provide default values for attributes.

    Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type: Attribute-list Declaration AttlistDecl '<!ATTLIST' S Name AttDef* S? '>' AttDef S Name S AttType S DefaultDecl The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.

    When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.

    Attribute Types

    XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints, as noted: Attribute Types AttType StringType | TokenizedType | EnumeratedType StringType 'CDATA' TokenizedType 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'

    ID

    Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

    One ID per Element Type

    No element type may have more than one ID attribute specified.

    ID Attribute Default

    An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

    IDREF

    Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

    Entity Name

    Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

    Name Token

    Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

    Enumerated attributes can take one of a list of values provided in the declaration. There are two kinds of enumerated types: Enumerated Attribute Types EnumeratedType NotationType | Enumeration NotationType 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' Enumeration '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

    Notation Attributes

    Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

    Enumeration

    Values of this type must match one of the Nmtoken tokens in the declaration.

    For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.

    Attribute Defaults

    An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults DefaultDecl '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)

    In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.

    Required Attribute

    If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

    Attribute Default Legal

    The declared default value must meet the lexical constraints of the declared attribute type.

    Fixed Attribute Default

    If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

    Examples of attribute-list declarations: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">

    Attribute-Value Normalization

    Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows:

    a character reference is processed by appending the referenced character to the attribute value

    an entity reference is processed by recursively processing the replacement text of the entity

    a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity

    other characters are processed by appending them to the normalized value

    If the declared value is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

    All attributes for which no declaration has been read should be treated by a non-validating parser as if declared CDATA.

    Conditional Sections

    Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section conditionalSect includeSect | ignoreSect includeSect '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' ignoreSect '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' ignoreSectContents Ignore ('<![' ignoreSectContents ']]>' Ignore)* Ignore Char* - (Char* ('<![' | ']]>') Char*)

    Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.

    If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. Note that for reliable parsing, the contents of even ignored conditional sections must be read in order to detect nested conditional sections and ensure that the end of the outermost (ignored) conditional section is properly detected. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored.

    If the keyword of the conditional section is a parameter-entity reference, the parameter entity must be replaced by its content before the processor decides whether to include or ignore the conditional section.

    An example: <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

    Physical Structures

    An XML document may consist of one or many storage units. These are called entities; they all have content and are all (except for the document entity, see below, and the external DTD subset) identified by name. Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document.

    Entities may be either parsed or unparsed. A parsed entity's contents are referred to as its replacement text; this text is considered an integral part of the document.

    An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.

    Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes.

    General entities are entities for use within the document content. In this specification, general entities are sometimes referred to with the unqualified term entity when this leads to no ambiguity. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts. Furthermore, they occupy different namespaces; a parameter entity and a general entity with the same name are two distinct entities.

    Character and Entity References

    A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices. Character Reference CharRef '&#' [0-9]+ ';' | '&hcro;' [0-9a-fA-F]+ ';' Legal Character

    Characters referred to using character references must match the production for Char.

    If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

    An entity reference refers to the content of a named entity. References to parsed general entities use ampersand (&) and semicolon (;) as delimiters. Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.

    Entity Reference Reference EntityRef | CharRef EntityRef '&' Name ';' PEReference '%' Name ';' Entity Declared

    In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that well-formed documents need not declare any of the following entities: &magicents;. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.

    Entity Declared

    In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities &magicents;, in the form specified in . The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration.

    Parsed Entity

    An entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of type ENTITY or ENTITIES.

    No Recursion

    A parsed entity must not contain a recursive reference to itself, either directly or indirectly.

    In DTD

    Parameter-entity references may only appear in the DTD.

    Examples of character and entity references: Type <key>less-than</key> (&hcro;3C;) to save options. This document was prepared on &docdate; and is classified &security-level;.

    Example of a parameter-entity reference: %ISOLat2;]]>

    Entity Declarations

    Entities are declared thus: Entity Declaration EntityDecl GEDecl | PEDecl GEDecl '<!ENTITY' S Name S EntityDef S? '>' PEDecl '<!ENTITY' S '%' S Name S PEDef S? '>' EntityDef EntityValue | (ExternalID NDataDecl?) PEDef EntityValue | ExternalID The Name identifies the entity in an entity reference or, in the case of an unparsed entity, in the value of an ENTITY or ENTITIES attribute. If the same entity is declared more than once, the first declaration encountered is binding; at user option, an XML processor may issue a warning if entities are declared multiple times.

    Internal Entities

    If the entity definition is an EntityValue, the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration. Note that some processing of entity and character references in the literal entity value may be required to produce the correct replacement text: see .

    An internal entity is a parsed entity.

    Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

    External Entities

    If the entity is not internal, it is an external entity, declared as follows: External Entity Declaration ExternalID 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral NDataDecl S 'NDATA' S Name If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.

    Notation Declared

    The Name must match the declared name of a notation.

    The SystemLiteral is called the entity's system identifier. It is a URI, which may be used to retrieve the entity. Note that the hash mark (#) and fragment identifier frequently used with URIs are not, formally, part of the URI itself; an XML processor may signal an error if a fragment identifier is given as part of a system identifier. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element type defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URIs are relative to the location of the resource within which the entity declaration occurs. A URI might thus be relative to the document entity, to the entity containing the external DTD subset, or to some other external parameter entity.

    An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

    In addition to a system identifier, an external identifier may include a public identifier. An XML processor attempting to retrieve the entity's content may use the public identifier to try to generate an alternative URI. If the processor is unable to do so, it must use the URI specified in the system literal. Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.

    Examples of external entity declarations: <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

    Parsed Entities The Text Declaration

    External parsed entities may each begin with a text declaration. Text Declaration TextDecl &xmlpio; VersionInfo? EncodingDecl S? &pic;

    The text declaration must be provided literally, not by reference to a parsed entity. No text declaration may appear at any position other than the beginning of an external parsed entity.

    Well-Formed Parsed Entities

    The document entity is well-formed if it matches the production labeled document. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt. An external parameter entity is well-formed if it matches the production labeled extPE. Well-Formed External Parsed Entity extParsedEnt TextDecl? content extPE TextDecl? extSubsetDecl An internal general parsed entity is well-formed if its replacement text matches the production labeled content. All internal parameter entities are well-formed by definition.

    A consequence of well-formedness in entities is that the logical and physical structures in an XML document are properly nested; no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

    Character Encoding in Entities

    Each external parsed entity in an XML document may use a different encoding for its characters. All XML processors must be able to read entities in either UTF-8 or UTF-16.

    Entities encoded in UTF-16 must begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.

    Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration containing an encoding declaration: Encoding Declaration EncodingDecl S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) EncName [A-Za-z] ([A-Za-z0-9._] | '-')* Encoding name contains only Latin characters In the document entity, the encoding declaration is part of the XML declaration. The EncName is the name of the encoding used.

    In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority , other than those just listed, should be referred to using their registered names. Note that these registered names are defined to be case-insensitive, so processors wishing to match against them should do so in a case-insensitive way.

    In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, for an encoding declaration to occur other than at the beginning of an external entity, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

    It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process.

    Examples of encoding declarations: <?xml encoding='UTF-8'?> <?xml encoding='EUC-JP'?>

    XML Processor Treatment of Entities and References

    The table below summarizes the contexts in which character references, entity references, and invocations of unparsed entities might appear and the required behavior of an XML processor in each case. The labels in the leftmost column describe the recognition context:

    as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content.

    as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue.

    as a Name, not a reference, appearing either as the value of an attribute which has been declared as type ENTITY, or as one of the space-separated tokens in the value of an attribute which has been declared as type ENTITIES.

    as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue.

    as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue or AttValue.

    Entity Type Character Parameter Internal General External Parsed General Unparsed Reference in Content Not recognized Included Included if validating Forbidden Included Reference in Attribute Value Not recognized Included in literal Forbidden Forbidden Included Occurs as Attribute Value Not recognized Forbidden Forbidden Notify Not recognized Reference in EntityValue Included in literal Bypassed Bypassed Forbidden Included Reference in DTD Included as PE Forbidden Forbidden Forbidden Forbidden Not Recognized

    Outside the DTD, the % character has no special significance; thus, what would be parameter entity references in the DTD are not recognized as markup in content. Similarly, the names of unparsed entities are not recognized except when they appear in the value of an appropriately declared attribute.

    Included

    An entity is included when its replacement text is retrieved and processed, in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, except that the replacement text of entities used to escape markup delimiters (the entities &magicents;) is always treated as data. (The string "AT&amp;T;" expands to "AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter.) A character reference is included when the indicated character is processed in place of the reference itself.

    Included If Validating

    When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor must include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor may, but need not, include the entity's replacement text. If a non-validating parser does not include the replacement text, it must inform the application that it recognized, but did not read, the entity.

    This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

    Forbidden

    The following are forbidden, and constitute fatal errors:

    the appearance of a reference to an unparsed entity.

    the appearance of any character or general-entity reference in the DTD except within an EntityValue or AttValue.

    a reference to an external entity in an attribute value.

    Included in Literal

    When an entity reference appears in an attribute value, or a parameter entity reference appears in a literal entity value, its replacement text is processed in place of the reference itself as though it were part of the document at the location the reference was recognized, except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. For example, this is well-formed: ]]> while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

    Notify

    When the name of an unparsed entity appears as a token in the value of an attribute of declared type ENTITY or ENTITIES, a validating processor must inform the application of the system and public (if any) identifiers for both the entity and its associated notation.

    Bypassed

    When a general entity reference appears in the EntityValue in an entity declaration, it is bypassed and left as is.

    Included as PE

    Just as with external parsed entities, parameter entities need only be included if validating. When a parameter-entity reference is recognized in the DTD and included, its replacement text is enlarged by the attachment of one leading and one following space (#x20) character; the intent is to constrain the replacement text of parameter entities to contain an integral number of grammatical tokens in the DTD.

    Construction of Internal Entity Replacement Text

    In discussing the treatment of internal entities, it is useful to distinguish two forms of the entity's value. The literal entity value is the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. The replacement text is the content of the entity, after replacement of character references and parameter-entity references.

    The literal entity value as given in an internal entity declaration (EntityValue) may contain character, parameter-entity, and general-entity references. Such references must be contained entirely within the literal entity value. The actual replacement text that is included as described above must contain the replacement text of any parameter entities referred to, and must contain the character referred to, in place of any character references in the literal entity value; however, general-entity references must be left as-is, unexpanded. For example, given the following declarations: ]]> then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; The general-entity reference "&rights;" would be expanded should the reference "&book;" appear in the document's content or an attribute value.

    These simple rules may have complex interactions; for a detailed discussion of a difficult example, see .

    Predefined Entities

    Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (&magicents;) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be used to escape < and & when they occur in character data.

    All XML processors must recognize these entities whether they are declared or not. For interoperability, valid XML documents should declare these entities, like any others, before using them. If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character, as shown below. ]]> Note that the < and & characters in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

    Notation Declarations

    Notations identify by name the format of unparsed entities, the format of elements which bear a notation attribute, or the application to which a processing instruction is addressed.

    Notation declarations provide a name for the notation, for use in entity and attribute-list declarations and in attribute specifications, and an external identifier for the notation which may allow an XML processor or its client application to locate a helper application capable of processing data in the given notation. Notation Declarations NotationDecl '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' PublicID 'PUBLIC' S PubidLiteral

    XML processors must provide applications with the name and external identifier(s) of any notation declared and referred to in an attribute value, attribute definition, or entity declaration. They may additionally resolve the external identifier into the system identifier, file name, or other information needed to allow the application to call a processor for data in the notation described. (It is not an error, however, for XML documents to declare and refer to notations for which notation-specific applications are not available on the system where the XML processor or application is running.)

    Document Entity

    The document entity serves as the root of the entity tree and a starting-point for an XML processor. This specification does not specify how the document entity is to be located by an XML processor; unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

    Conformance Validating and Non-Validating Processors

    Conforming XML processors fall into two classes: validating and non-validating.

    Validating and non-validating processors alike must report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read.

    Validating processors must report violations of the constraints expressed by the declarations in the DTD, and failures to fulfill the validity constraints given in this specification. To accomplish this, validating XML processors must read and process the entire DTD and all external parsed entities referenced in the document.

    Non-validating processors are required to check only the document entity, including the entire internal DTD subset, for well-formedness. While they are not required to check the document for validity, they are required to process all the declarations they read in the internal DTD subset and in any parameter entity that they read, up to the first reference to a parameter entity that they do not read; that is to say, they must use the information in those declarations to normalize attribute values, include the replacement text of internal entities, and supply default attribute values. They must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations.

    Using XML Processors

    The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations. Less is required of a non-validating processor; it need not read any part of the document other than the document entity. This has two effects that may be important to users of XML processors:

    Certain well-formedness errors, specifically those that require reading external entities, may not be detected by a non-validating processor. Examples include the constraints entitled Entity Declared, Parsed Entity, and No Recursion, as well as some of the cases described as forbidden in .

    The information passed from the processor to the application may vary, depending on whether the processor reads parameter and external entities. For example, a non-validating processor may not normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities.

    For maximum reliability in interoperating between different XML processors, applications which use non-validating processors should not rely on any behaviors not required of such processors. Applications which require facilities such as the use of default attributes or internal entities which are declared in external entities should use validating XML processors.

    Notation

    The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression

    Symbols are written with an initial capital letter if they are defined by a regular expression, or with an initial lower case letter otherwise. Literal strings are quoted.

    Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

    where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated. The number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant for XML.

    matches any character with a value in the range(s) indicated (inclusive).

    matches any character with a value outside the range indicated.

    matches any character with a value not among the characters given.

    matches a literal string matching that given inside the double quotes.

    matches a literal string matching that given inside the single quotes.

    These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

    expression is treated as a unit and may be combined as described in this list.

    matches A or nothing; optional A.

    matches A followed by B.

    matches A or B but not both.

    matches any string that matches A but does not match B.

    matches one or more occurrences of A.

    matches zero or more occurrences of A.

    Other notations used in the productions are:

    comment.

    well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.

    validity constraint; this identifies by name a constraint on valid documents associated with a production.

    References Normative References (Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. IETF (Internet Engineering Task Force). RFC 1766: Tags for the Identification of Languages, ed. H. Alvestrand. 1995. (International Organization for Standardization). ISO 639:1988 (E). Code for the representation of names of languages. [Geneva]: International Organization for Standardization, 1988. (International Organization for Standardization). ISO 3166-1:1997 (E). Codes for the representation of names of countries and their subdivisions — Part 1: Country codes [Geneva]: International Organization for Standardization, 1997. ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7). The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996. Other References Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Reading: Addison-Wesley, 1986, rpt. corr. 1988. Berners-Lee, T., R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax and Semantics. 1997. (Work in progress; see updates to RFC1738.) Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Extended abstract in I. Simon, Hrsg., LATIN 1992, S. 97-98. Springer-Verlag, Berlin 1992. Full Version in Theoretical Computer Science 120: 197-213, 1993. Brüggemann-Klein, Anne, and Derick Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, Bericht 38, Oktober 1991. James Clark. Comparison of SGML and XML. See http://www.w3.org/TR/NOTE-sgml-xml-971215. IETF (Internet Engineering Task Force). RFC 1738: Uniform Resource Locators (URL), ed. T. Berners-Lee, L. Masinter, M. McCahill. 1994. IETF (Internet Engineering Task Force). RFC 1808: Relative Uniform Resource Locators, ed. R. Fielding. 1995. IETF (Internet Engineering Task Force). RFC 2141: URN Syntax, ed. R. Moats. 1997. ISO (International Organization for Standardization). ISO 8879:1986(E). Information processing — Text and Office Systems — Standard Generalized Markup Language (SGML). First edition — 1986-10-15. [Geneva]: International Organization for Standardization, 1986. ISO (International Organization for Standardization). ISO/IEC 10744-1992 (E). Information technology — Hypermedia/Time-based Structuring Language (HyTime). [Geneva]: International Organization for Standardization, 1992. Extended Facilities Annexe. [Geneva]: International Organization for Standardization, 1996. Character Classes

    Following the characteristics defined in the Unicode standard, characters are classed as base characters (among others, these contain the alphabetic characters of the Latin alphabet, without diacritics), ideographic characters, and combining characters (among others, this class contains most diacritics); these classes combine to form the class of letters. Digits and extenders are also distinguished. Characters Letter BaseChar | Ideographic BaseChar [#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] | [#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] | [#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 | [#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] | [#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] | [#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] | [#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 | [#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] | [#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] | [#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D | [#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] | [#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] | [#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] | [#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] | [#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] | [#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] | [#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 | [#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] | [#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] | [#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] | [#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] | [#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] | [#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] | [#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] | [#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] | [#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] | [#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] | [#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A | #x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 | #x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] | #x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] | [#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] | [#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C | #x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 | [#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] | [#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] | [#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 | [#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] | [#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B | #x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE | [#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] | [#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3] Ideographic [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029] CombiningChar [#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] | [#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 | [#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] | [#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] | #x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] | [#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] | [#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 | #x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] | [#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC | [#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] | #x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] | [#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] | [#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] | [#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] | [#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] | [#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] | #x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 | [#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] | #x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] | [#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] | [#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] | #x3099 | #x309A Digit [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29] Extender #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]

    The character classes defined here can be derived from the Unicode character database as follows:

    Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

    Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

    Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names.

    Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.

    The following characters are treated as name-start characters rather than name characters, because the property file classifies them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.

    Characters #x20DD-#x20E0 are excluded (in accordance with Unicode, section 5.14).

    Character #x00B7 is classified as an extender, because the property list so identifies it.

    Character #x0387 is added as a name character, because #x00B7 is its canonical equivalent.

    Characters ':' and '_' are allowed as name-start characters.

    Characters '-' and '.' are allowed as name characters.

    XML and SGML

    XML is designed to be a subset of SGML, in that every valid XML document should also be a conformant SGML document. For a detailed comparison of the additional restrictions that XML places on documents beyond those of SGML, see .

    Expansion of Entity and Character References

    This appendix contains some examples illustrating the sequence of entity- and character-reference recognition and expansion, as specified in .

    If the DTD contains the declaration An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).

    " > ]]> then the XML processor will recognize the character references when it parses the entity declaration, and resolve them before storing the following string as the value of the entity "example": An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

    ]]>
    A reference in the document to "&example;" will cause the text to be reparsed, at which time the start- and end-tags of the "p" element will be recognized and the three references will be recognized and expanded, resulting in a "p" element with the following content (all data, no delimiters or markup):

    A more complex example will illustrate the rules and their effects fully. In the following example, the line numbers are solely for reference. 2 4 5 ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. ]]> This produces the following:

    in line 4, the reference to character 37 is expanded immediately, and the parameter entity "xx" is stored in the symbol table with the value "%zz;". Since the replacement text is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since "zz" is not yet declared.)

    in line 5, the character reference "&#60;" is expanded immediately and the parameter entity "zz" is stored with the replacement text "<!ENTITY tricky "error-prone" >", which is a well-formed entity declaration.

    in line 6, the reference to "xx" is recognized, and the replacement text of "xx" (namely "%zz;") is parsed. The reference to "zz" is recognized in its turn, and its replacement text ("<!ENTITY tricky "error-prone" >") is parsed. The general entity "tricky" has now been declared, with the replacement text "error-prone".

    in line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the "test" element is the self-describing (and ungrammatical) string This sample shows a error-prone method.

    Deterministic Content Models

    For compatibility, it is required that content models in element type declarations be deterministic.

    SGML requires deterministic content models (it calls them "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

    For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser cannot know which b in the model is being matched without looking ahead to see which element follows the b. In this case, the two references to b can be collapsed into a single reference, making the model read (b, (c | d)). An initial b now clearly matches only a single name in the content model. The parser doesn't need to look ahead to see what follows; either c or d would be accepted.

    More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman . In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error.

    Algorithms exist which allow many but not all non-deterministic content models to be reduced automatically to equivalent deterministic models; see Brüggemann-Klein 1991 .

    Autodetection of Character Encodings

    The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself. Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying (external) information. We consider the first case first.

    Because each XML entity not in UTF-8 or UTF-16 format must begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the Byte Order Mark required of UTF-16 data streams is "#xFEFF".

    00 00 00 3C: UCS-4, big-endian machine (1234 order)

    3C 00 00 00: UCS-4, little-endian machine (4321 order)

    00 00 3C 00: UCS-4, unusual octet order (2143)

    00 3C 00 00: UCS-4, unusual octet order (3412)

    FE FF: UTF-16, big-endian

    FF FE: UTF-16, little-endian

    00 3C 00 3F: UTF-16, big-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 00 3F 00: UTF-16, little-endian, no Byte Order Mark (and thus, strictly speaking, in error)

    3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the ASCII characters, the encoding declaration itself may be read reliably

    4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use)

    other: UTF-8 without an encoding declaration, or else the data stream is corrupt, fragmentary, or enclosed in a wrapper of some kind

    This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 from each other, or to distinguish the specific EBCDIC code page in use, and so on).

    Because the contents of the encoding declaration are restricted to ASCII characters, a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable.

    Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input.

    Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration. Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity.

    The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended.

    If an XML entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery.

    If an XML entity is delivered with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery.

    These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules.

    W3C XML Working Group

    This specification was prepared and approved for publication by the W3C XML Working Group (WG). WG approval of this specification does not necessarily imply that all WG members voted for its approval. The current and former members of the XML WG are:

    Jon Bosak, SunChair James ClarkTechnical Lead Tim Bray, Textuality and NetscapeXML Co-editor Jean Paoli, MicrosoftXML Co-editor C. M. Sperberg-McQueen, U. of Ill.XML Co-editor Dan Connolly, W3CW3C Liaison Paula Angerstein, Texcel Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, ISOGEN Eve Maler, ArborText Tom Magliery, NCSA Murray Maloney, Muzmo and Grif Makoto Murata, Fuji Xerox Information Systems Joel Nava, Adobe Conleth O'Connell, Vignette Peter Sharpe, SoftQuad John Tigue, DataChannel
    XML-XSLT-0.48/examples/grammar.xsl0100644000076500007650000000141407115344403017073 0ustar jonathanjonathan Example application of XML::XSLT

    Example application of XML::XSLT

    Extraction of grammar rules from Recommendations
    [] ::=
    XML-XSLT-0.48/examples/XSLT.xml0100644000076500007650000003422507421365427016250 0ustar jonathanjonathan XML::XSLT - A perl module for processing XSLT SYNOPSIS new ($xsl, warnings => 1); ]]> transform ($xmlfile); print $xslt->toString; ]]> dispose(); ]]> DESCRIPTION This module implements the W3C's XSLT specification. The goal is full implementation of this spec, but we have not yet achieved that. However, it already works well. See XML::XSLT Commands for the current status of each command. XML::XSLT makes use of XML::DOM and LWP::Simple, while XML::DOM uses XML::Parser. Therefore XML::Parser, XML::DOM and LWP::Simple have to be installed properly for XML::XSLT to run. Specifying Sources The stylesheets and the documents may be passed as filenames, file handles regular strings, string references or DOM-trees. Functions that require sources (e.g. new), will accept either a named parameter or simply the argument. Either of the following are allowed: new($xsl); my $xslt = XML::XSLT->new(Source => $xsl); ]]> In documentation, the named parameter `Source' is always shown, but it is never required. METHODS new(Source => $xml [, %args]) Returns a new XSLT parser object. Valid flags are: DOMparser_args Hashref of arguments to pass to the XML::DOM::Parser object's parse method. variables Hashref of variables and their values for the stylesheet. base Base of URL for file inclusion. debug Turn on debugging messages. warnings Turn on warning messages. indent Starting amount of indention for debug messages. Defaults to 0. indent_incr Amount to indent each level of debug message. Defaults to 1. open_xml(Source => $xml [, %args]) Gives the XSLT object new XML to process. Returns an XML::DOM object corresponding to the XML. base The base URL to use for opening documents. parser_args Arguments to pase to the parser. open_xsl(Source => $xml, [, %args]) Gives the XSLT object a new stylesheet to use in processing XML. Returns an XML::DOM object corresponding to the stylesheet. Any arguments present are passed to the XML::DOM::Parser. base The base URL to use for opening documents. parser_args Arguments to pase to the parser. process(%variables) Processes the previously loaded XML through the stylesheet using the variables set in the argument. transform(Source => $xml [, %args]) Processes the given XML through the stylesheet. Returns an XML::DOM object corresponding to the transformed XML. Any arguments present are passed to the XML::DOM::Parser. serve(Source => $xml [, %args]) Processes the given XML through the stylesheet. Returns a string containg the result. Example: new($xsl); print $xslt->serve $xml; ]]> http_headers If true, then prepends the appropriate HTTP headers (e.g. Content-Type, Content-Length); Defaults to true. xml_declaration If true, then the result contains the appropriate <?xml?> header. Defaults to true. xml_version The version of the XML. Defaults to 1.0. doctype The type of DOCTYPE this document is. Defaults to SYSTEM. toString Returns the result of transforming the XML with the stylesheet as a string. to_dom Returns the result of transforming the XML with the stylesheet as an XML::DOM object. media_type Returns the media type (aka mime type) of the object. dispose Executes the dispose method on each XML::DOM object. XML::XSLT Commands xsl:apply-imports no Not supported yet. xsl:apply-templates limited Attribute 'select' is supported to the same extent as xsl:value-of supports path selections. Not supported yet: - attribute 'mode' - xsl:sort and xsl:with-param in content xsl:attribute partially Adds an attribute named to the value of the attribute 'name' and as value the stringified content-template. Not supported yet: - attribute 'namespace' xsl:attribute-set yes Partially xsl:call-template yes Takes attribute 'name' which selects xsl:template's by name. Weak support: - xsl:with-param (select attrib not supported) Not supported yet: - xsl:sort xsl:choose yes Tests sequentially all xsl:whens until one succeeds or until an xsl:otherwise is found. Limited test support, see xsl:when xsl:comment yes Supported. xsl:copy partially xsl:copy-of limited Attribute 'select' functions as well as with xsl:value-of xsl:decimal-format no Not supported yet. xsl:element yes xsl:fallback no Not supported yet. xsl:for-each limited Attribute 'select' functions as well as with xsl:value-of Not supported yet: - xsl:sort in content xsl:if limited Identical to xsl:when, but outside xsl:choose context. xsl:import no Not supported yet. xsl:include yes Takes attribute href, which can be relative-local, absolute-local as well as an URL (preceded by identifier http:). xsl:key no Not supported yet. xsl:message no Not supported yet. xsl:namespace-alias no Not supported yet. xsl:number no Not supported yet. xsl:otherwise yes Supported. xsl:output limited Only the initial xsl:output element is used. The "text" output method is not supported, but shouldn't be difficult to implement. Only the "doctype-public", "doctype-system", "omit-xml-declaration", "method", and "encoding" attributes have any support. xsl:param experimental Synonym for xsl:variable (currently). See xsl:variable for support. xsl:preserve-space no Not supported yet. Whitespace is always preserved. xsl:processing-instruction yes Supported. xsl:sort no Not supported yet. xsl:strip-space no Not supported yet. No whitespace is stripped. xsl:stylesheet limited Minor namespace support: other namespace than 'xsl:' for xsl-commands is allowed if xmlns-attribute is present. xmlns URL is verified. Other attributes are ignored. xsl:template limited Attribute 'name' and 'match' are supported to minor extend. ('name' must match exactly and 'match' must match with full path or no path) Not supported yet: - attributes 'priority' and 'mode' xsl:text yes Supported. xsl:transform limited Synonym for xsl:stylesheet xsl:value-of limited Inserts attribute or element values. Limited support: <xsl:value-of select="."/> <xsl:value-of select="/root-elem"/> <xsl:value-of select="elem"/> <xsl:value-of select="//elem"/> <xsl:value-of select="elem[n]"/> <xsl:value-of select="//elem[n]"/> <xsl:value-of select="@attr"/> <xsl:value-of select="text()"/> <xsl:value-of select="processing-instruction()"/> <xsl:value-of select="comment()"/> and combinations of these. Not supported yet: - attribute 'disable-output-escaping' xsl:variable experimental Very limited. It should be possible to define a variable and use it with &lt;xsl:value select="$varname" /&gt; within the same template. xsl:when limited Only inside xsl:choose. Limited test support: <xsl:when test="@attr='value'"> <xsl:when test="elem='value'"> <xsl:when test="path/[@attr='value']"> <xsl:when test="path/[elem='value']"> <xsl:when test="path"> path is supported to the same extend as with xsl:value-of xsl:with-param experimental It is currently not functioning. (or is it?) SUPPORT General information, bug reporting tools, the latest version, mailing lists, etc. can be found at the XML::XSLT homepage: DEPRECATIONS Methods and interfaces from previous versions that are not documented in this version are deprecated. Each of these deprecations can still be used but will produce a warning when the deprecation is first used. You can use the old interfaces without warnings by passing new() the flag use_deprecated. Example: new($xsl, "FILE", use_deprecated => 1); ]]> The deprecated methods will disappear by the time a 1.0 release is made. The deprecated methods are : output_string use toString instead result_string use toString instead output use toString instead result use toString instead result_mime_type use media_type instead output_mime_type use media_type instead result_tree use to_dom instead output_tree use to_dom instead transform_document use transform instead process_project use process instead open_project use Source argument to new() and transform instead. print_output use serve() instead. BUGS Yes. HISTORY Geert Josten and Egon Willighagen developed and maintained XML::XSLT up to version 0.22. At that point, Mark Hershberger started moving the project to Sourceforge and began working on it with Bron Gondwana. LICENCE Copyright (c) 1999 Geert Josten & Egon Willighagen. All Rights Reserverd. This module is free software, and may be distributed under the same terms and conditions as Perl. AUTHORS Geert Josten <gjosten@sci.kun.nl> Egon Willighagen <egonw@sci.kun.nl> Mark A. Hershberger <mah@everybody.org> Bron Gondwana <perlcode@brong.net> Jonathan Stowe <jns@gellyfish.com> SEE ALSO XML::DOM, LWP::Simple, XML::Parser XML-XSLT-0.48/examples/identity.xml0100644000076500007650000000401607120112224017257 0ustar jonathanjonathan hoi piepeloi! Dit is wat test tekst... Nieuwjaarsborrel 4/1/1999 Subfaculteit Scheikunde kantine B-faculteit 16.30 Informed Chemistry: what can it do for synthesis? 13/1/1999 Chemweb.Com Internet 16.00 "Nieuwe materialen op basis van organische synthese" 2/2/1999 NSR Spreker: dr. Frank van Veggel, Laboratorium voor organische chemie, Universiteit Twente
    Gastheer: Prof. dr. RJM Nolte
    CZ I 14.00
    Paaslympics 5/4/1999 St. Beet en BeeVee W en N Paas-Beestborrel 6/4/1999 BBB en Leonardo X-Files: Fight the Future 6/4/1999 St. Beet CZ N2 19.30u 1,50 Geert Josten!?!?
    XML-XSLT-0.48/examples/xpath.xsl0100644000076500007650000000335407115344404016577 0ustar jonathanjonathan

    Function: ( , * ? )

    (available in ) or , or ,
    XML-XSLT-0.48/examples/agenda.xml0100644000076500007650000006535207115344402016670 0ustar jonathanjonathan Nieuwjaarsborrel 4/1/1999 Subfaculteit Scheikunde kantine B-faculteit 16.30 Informed Chemistry: what can it do for synthesis? 13/1/1999 Chemweb.Com Internet 16.00 Nieuwjaarsborrel 13/1/1999 Sigma Ul-kantine 16.00 The development of oxidation state +IV for palladium in it's organometalic chemistry 25/1/1999 Anorg. Chem. (NSR Center) Spreker: Prof. dr. A.J. Canty, University of Tasmania
    A0004 11.30-12.30
    Virtuele Universiteit of Universele Virtualiteit 26/1/1999 Gemeente Nijmegen Raadhuis Nijmegen 13:30 Exploring tertiary folding in RNA 28/1/1999 KUN Door M.H.Kolk
    Promotor: prof.dr.C.W.Hilbers
    Copromotor: dr. H.A.Heus
    KUN Aula 13:30 precies
    Schaatsen 29/1/1999 Sigma Triavium va 14.00
    Begin Voorjaarssemester 1/2/1999 KUN "Nieuwe materialen op basis van organische synthese" 2/2/1999 NSR Spreker: dr. Frank van Veggel, Laboratorium voor organische chemie, Universiteit Twente
    Gastheer: Prof. dr. RJM Nolte
    CZ I 14.00
    "Transition metal catalysed carbon-carbon bond formation" 2/2/1999 NSR Spreker: dr Paul CJ Kamer, Institute of Molecular Chemistry, University of Amsterdam
    Gastheer: Prof. dr. RJM Nolte
    CZ I 16.00
    "Novel approaches for the synthesis of druglike building blocks" 3/2/1999 NSR Spreker: dr Floris PJT Rutjes, Institute of Molecular Chemistry, University of Amsterdam
    Gastheer: Prof. dr. RJM Nolte
    CZ I 10.00
    BBB-bestuurswisselborrel 4/2/1999 BBB Collegezalenrondgang 16.30u Alien IV 4/2/1999 St. Beet Kosten: f1,50 CZ N2 19.30u Deadline G-mi 5/2/1999 Sollicitatietraining 9/2/1999 BBB 10.30u-12.30u CZ N3 Sigma avond 9/2/1999 Sigma Cafe de Fiets 21.00u Natuurkunde in een Notendop 10/2/1999 9.30u-16.30u Marie Curie Tafelvoetbaltoernooi 10/2/1999 15.45u Thalia Inschrijven bij Thalia
    Deelname toernooi gratis
    Aansluitend borrel (niet gratis)
    W en N Carnavals-Beestborrel 11/2/1999 Marie-Curie kantine Marie Curie The Structure and Fluxional Behaviour of the Binary Carbonyls 11/2/1999 Chemweb.Com Internet 17.00 BeestFeest 11/2/1999 Doornroosje St. Beet Carnavalsvakantie 15:19/2/1999 Sigma avond 23/2/1999 Sigma Cafe de Fiets 21.00u Schilderijrestauraties 24/2/1999 16.00 CZ N1 Sigma ICT aan de KUN en daarbuiten 26/2/1999 10.00 Collegezalencomplex, Mercatorpad 1 IOWO
    Spelletjesavond 9/3/1999 Sigma Cafe de Fiets 19.30 Glas: helder en fascinerend, materie en fenomeen 9/3/1999 Studium Generale Collegezalencomplex 20:00 Sprekers: prof.dr. Carel L. Davidson en Louis Goosen
    Feest met A-faculteiten: Fantasy-Feest 10/3/1999 Sigma, Thalia, Desda, Postelein, Svn en Sophia 22.00 Diogenes entree:
    f2,50 voor leden
    f3,50 voor niet leden
    f1,00 voor Diogenes leden
    Glas: Bezoek aan de glasinstrumentenmakerij van de KUN 11/3/1999 Studium Generale Toernooiveld 1 17:00 Inschrijven verplicht.
    Inlichtingen:
    Monique van Haaster 024-3615912 en 024-3612726
    Wijnproeven 11/3/1999 17:00 5,00 Marjolijn of Irene Slechts een beperkt aantal mensen kan meedoen
    South Park 11/3/1999 St. Beet CZ N2 19.30u 1,50 'Selection in neural information processing' 12/3/1999 13:30 Precies KUN Aula P.J.L.J. van de Laar
    Promotor: prof.dr. C.C.A.M. Gielen
    Copromotor: dr. T.M. Heskes
    Sigma Symposium: Economie in de Erlenmyer 17/3/1999 Sigma CZ N2 Metal Ion Cage Complexes: Synthesis, Reactivity and Uses 17/3/1999 Chemweb 14:30 Internet Victor Westhoff-lezing: "Over de toestand van van natuur en milieu in Nederland en daarbuiten" 17/3/1999 KUN 14:30 KUN Aula Assessmenttraining 23/3/1999 BBB 10.30u-12.30u CZ N4 De Engelse Industriële Revolutie 24/3/1999 SCN99 14.00u CZ N6 dr. A. Bots Beheersing van afvalwaterlozing van de chemische industrie in UK 24/3/1999 SCN99 14.00u CZ N6 dr. R. Leuven Keltische Week: Whiskey Proeven 24/3/1999 SPC 16.00u Cultuur Cafe 5,- tel. 3615908 Ledenvergadering 31/3/1999 Sigma 20.00u Bovenzaal cafe de Fiets
    Grolsch 1/4/1999 Enschede 13.30 Sigma Studiereis lezing 1/4/1999 SCN99 16.00u CZ N6 Deadline G-mi 1/4/1999 Goede Vrijdag 2/4/1999 2e Paasdag 5/4/1999 Paaslympics 5/4/1999 St. Beet en BeeVee W en N Paas-Beestborrel 6/4/1999 BBB en Leonardo X-Files: Fight the Future 6/4/1999 St. Beet CZ N2 19.30u 1,50 Programmeren in C 7/4/1999 Sigma Subfaculteits bestuursvergadering 7/4/1999 CZ III 16.15 Na afloop borrel Studiereis lezing 7/4/1999 SCN99 13.30u CZ N1 BeestFeest 8/4/1999 St. Beet Weekendkamp 9:11/4/1999 Sigma Darten met Thalia 13/4/1999 Thalia en Sigma All-In In vivo 13C MR spectroscopy for human investigations 13/4/1999 KUN KUN-Aula/Congresgebouw 15:30 precies Door: A.J.van den Bergh
    Promotor: prof.dr. A. Heerschap
    Van Melsenprijs 16/4/1999 B-Faculteit Deze prijs wordt jaarlijks uitgereikt aan leerlingen uit 5-havo en 6-vwo, voor experimenten uitgevoerd voor het schoolonderzoek. Biologie symposium 16/4/1999 i.v.m. het biologie symposium is de UL-kantine vandaag gesloten Ouderdag 17/4/1999 Sigma Molecular Simulation '99 21/4/1999 - 4/5/199 VEI Internetconferentie met zeer veel (gratis) lezingen. Internet GRATIS Blade 22/4/1999 St. Beet CZ N2 19.30u 1,50 In de pauze is er koffie en thee. Neem zelf een kopje mee! Pre-Batavierenrace feest 22/4/1999 Kolpinghuis 22.30u 3,50 met Midnight run, Energieke studentenband uit Eindhoven
    Batavierenrace 24+25/4/1999 St. Batavierenrace Schrijf je in bij een van de leden van de sportcie Snuffelweek op de B-faculteit 27/4/1999 BeeVee, Desda, Leonardo, Marie-Curie, Sigma, Thalia 11.00-16.30 Grasveld B-faculteit Borrel om 15.00 Karaokeshow 28/4/1999 Sigma UL-kantine 16.00 Cap Gemini 29/4/1999 12.00-16.30 Sigma Koninginnedag 30/4/1999
    Uiterste inschrijfdatum Unilever B.C. 5/5/1999 De Bussiness Course is van 6-9 juli. Studiereis naar Engeland en Schotland 2:16/5/1999 Meivakantie 3:7/5/1999 Chemische wapens 11/5/1999 Sigma 16.00 CZN1 Patricia Dankers The Truman show 11/5/1999 St. Beet CZ N2 19.30u 1,50 In de pauze is er koffie en thee. Neem zelf een kopje mee! Bijbels Openluchtmuseum 12/5/1999 Sigma 's-middags Heilige Landstichting Irene Reynhout Hemelvaart 13+14/5/1999 Open Nederlandse Chemie Sportdagen 13+14/5/1999 NMR studies of fusarium solani pisi cutinase. Structure-mobility-function relationships 18/5/1999 KUN Door J.J. Prompers
    Promotor: prof.dr.C.W.Hilbers
    Copromotor: prof.dr.ir.H.A.M. Pepermans
    KUN Aula 13:30 precies
    Soundmixshow 19/5/1999 Cultureel Cafe The challenge of pragmatic process philosophy 19:21/5/1999 CZ II BetaBedrijvenBeurs 20/5/1999 BBB Rondgang Feest 2e jaars jaarraad 20/5/1999 De Fiets 21.00 Jaarraad '97 Voorlichtingsbijeenkomst studie Natuurwetenschappen 21/5/1999 A3012 12.45-13.45 Voor meer informatie neem contact op met:
    Dr. G. W. Vuister
    tel. 52321, UL 274
    Pinksteren 24/5/1999 Het kriebelt 99 25:29/5/1999 Diogenes &amp SPC Voor meer info kijk in het programmaboekje, volg de link of bel Diogenes (3604842) of SPC (3612823) W &amp; N Eindejaars-Beestborrel 26/5/1999 Thalia kantine Thalia Freddie, de koele kikker 26/5/1999 St. Beet CZ N2 19.30u 1,50 In de pauze is er koffie en thee. Neem zelf een kopje mee! Voorlichtingsbijeenkomst predoctorale lerarenopleiding 26/5/1999 CZN6 16.00-17.00 Unilo De bijeenkomst zal gaan over inhoud en werkwijze v/d predoc. lerarenopl.
    Beestfeest 27/5/1999 St. Beet Doornroosje 21.30-4.00 Entree gratis
    Garderobe: f1,=
    Bier: f2,75
    KNCV-bedrijfseconomiedag 3/6/1999 KNCV Nijmegen Deadline G-mi 4/6/1999 Eindejaars borrel 7/6/1999 UL-kantine 16.00 Sigma Met speciale Sigma-onthulling Feest 1e jaars jaarraad 9/6/1999 cafe de Fiets 21.30 Jaarraad '98 Entree gratis Industri&euml;le Chemie 14:18/6/1999 CZ I 10.00-15.00 Meer informatie:
    Prof. dr. A. Bruggink
    tel. 53331 (secr. 52676)
    Barbecue 22/6/1999 Sigma 15,= Wylerbergmeer Dendritic Molecules, Concepts, Synthesis and Perspectives 25/6/1999 CZ III 14.00-15.00 Nijmegen SON Research Institute Spreker:
    Prof.Dr. G.R. Newkome
    Vice President for Research
    USF, Office of Research
    University of South Florida
    Een magnetische blik op het leven 2/7/1999 KUN-Aula 15.00 KUN Spreker:
    dr. A. Heerschap
    Unilever Bussines Course 6:9/7/1999 Inschrijving voor 5 mei. Zomervakantie 16/7/1999-31/8/1999 Voor diegene die vakantie hebben:
    Veel plezier en tot volgend jaar!
    Essaywedstrijd voor studenten 31/8/1999 KUN Studenten kunnen meedoen aan een essaywedstrijd. Hiervoor dient een wetenschappelijk es-say te worden geschreven over het thema 'Met de ziel onder de arm. Over de lichamelijkheid van de geest'. Inleveren tot 1 september.
    Inl. Studium Generale Nijmegen 024-3615760
    XML-XSLT-0.48/examples/91-22-5.cml.xml0100644000076500007650000000070007120107675017026 0ustar jonathanjonathan C9H7N 129.2 237.1 -14.9 1.098 2.2 XML-XSLT-0.48/examples/agenda.dtd0100644000076500007650000000250007115344376016637 0ustar jonathanjonathan XML-XSLT-0.48/examples/xslt.xsl0100644000076500007650000000733007115344404016443 0ustar jonathanjonathan ]> p.element-syntax { border: solid thin }

    < xsl:

    <xsl:

    <!-- Category: -->
    >
      <!-- Content: -->
    </xsl: >
    ( ) #PCDATA
    xsl: , | + * ?  />
       =
    | | " " { }
    XML-XSLT-0.48/examples/95-48-7.xml0100644000076500007650000000026507115344336016301 0ustar jonathanjonathan 191.04 29.8 XML-XSLT-0.48/examples/bernhard.xml0100644000076500007650000000064307115344402017226 0ustar jonathanjonathan INNSBRUCK/Zentrum INNSBRUCK/Reichenau KRAMSACH/Angerberg WOERGL/Stelzh.Str. KUFSTEIN/Zentrum LIENZ/Amlacherkreuzung XML-XSLT-0.48/examples/cml2cml.xsl0100644000076500007650000000335107115344403017000 0ustar jonathanjonathan ]]> ]]> type="text/xsl" href="cml.xsl" XML-XSLT-0.48/examples/test2.xsl0100644000076500007650000001027407115344403016512 0ustar jonathanjonathan <xsl:value-of select="@NAME"/> Agenda


















    XML-XSLT-0.48/MANIFEST0100644000076500007650000000175610014756637014251 0ustar jonathanjonathanChangeLog Makefile.PL README xslt-parser lib/XML/XSLT.pm META.yml MANIFEST examples/XSLT.xsl examples/XSLT.html examples/XSLT.xml examples/91-22-5.cml examples/91-22-5.cml.xml examples/91-22-5.xml examples/95-48-7.xml examples/95-48-7.xsl examples/REC-xml-19980210.xml examples/REC-xslt-19991116.xml examples/agenda.dtd examples/agenda.html examples/agenda.xml examples/agenda.xsl examples/bernhard.xml examples/bernhard.xsl examples/cml.xsl examples/cml2cml.xsl examples/grammar.xml examples/grammar.xsl examples/grammar2.xml examples/grammar2.xsl examples/identity.xml examples/identity.xsl examples/identity.xsl_org examples/test.dtd examples/test.xml examples/test.xsl examples/test2.xml examples/test2.xsl examples/xmlspec.xsl examples/xpath.xsl examples/xslt.xsl t/attributes.t t/call_template.t t/cdata_sect.t t/comment.t t/copy.t t/doe.t t/features.t t/for-each.t t/forward.t t/output.t t/params.t t/pattern.t t/pi.t t/select.t t/spec_examples.t t/text_nodes.t t/variable.t t/xsl_cond.t t/zeroes.t XML-XSLT-0.48/t/0040700000076500007650000000000010015344303013323 5ustar jonathanjonathanXML-XSLT-0.48/t/copy.t0100644000076500007650000000241510014363624014501 0ustar jonathanjonathan#!/usr/bin/perl # Test xsl:copy # $Id: copy.t,v 1.1 2004/02/17 10:06:12 gellyfish Exp $ use Test::More tests => 2; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet =< bold EOS my $xml =< a EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = ''; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"apply attribute set to xsl:copy"); XML-XSLT-0.48/t/for-each.t0100644000076500007650000000670310015073021015205 0ustar jonathanjonathan#!/usr/bin/perl # Test foreach with various selects # $Id: for-each.t,v 1.1 2004/02/19 08:38:41 gellyfish Exp $ use Test::More tests => 4; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet =< EOS my $xml =< EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q%Processing-Instruction 1 type='text/xml'Processing-Instruction 2 type='text/xml'%; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select multiple processing-instruction()"); eval { my $stylesheet =< EOS my $xml =< EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q% TEST COMMENT %; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select single comment()"); eval { my $stylesheet =< EOS my $xml =< TEST TEXT EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q%TEST TEXT%; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select text()"); XML-XSLT-0.48/t/select.t0100644000076500007650000000672010015073021014777 0ustar jonathanjonathan#!/usr/bin/perl # Test select/match with various special paths # $Id: select.t,v 1.2 2004/02/19 08:38:41 gellyfish Exp $ use Test::More tests => 4; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet =< EOS my $xml =< EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q%Processing-Instruction 1 type='text/xml'%; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select single processing-instruction()"); eval { my $stylesheet =< EOS my $xml =< EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q% TEST COMMENT %; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select single comment()"); eval { my $stylesheet =< EOS my $xml =< TEST TEXT EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = q%TEST TEXT%; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"select text()"); XML-XSLT-0.48/t/comment.t0100644000076500007650000000136710014363624015176 0ustar jonathanjonathan# Test that xsl:comment works # $Id: comment.t,v 1.2 2004/02/17 10:06:12 gellyfish Exp $ use strict; use Test::More tests => 2; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet =< Comment EOS my $xml =< Foo EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = ''; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"Comment text as expected"); XML-XSLT-0.48/t/attributes.t0100644000076500007650000001046710014370215015715 0ustar jonathanjonathan# Test that attributes work # $Id: attributes.t,v 1.4 2004/02/17 10:44:29 gellyfish Exp $ use Test::More tests => 5; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet = <

    fooFoo

    EOS my $xml =<

    Some Random text

    EOX my $expected = qq{

    Foo

    }; my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $outstr = $parser->toString(); warn "$outstr\n" if $DEBUGGING; $parser->dispose(); die "$outstr ne $expected\n" unless $outstr eq $expected; }; ok(!$@, "xsl:attribute works"); eval { my $stylesheet =< This is a summary Foo EOS my $xml =<

    Some Random text

    EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $outstr = $parser->toString() ; my $expected =<

    Foo

    EOE chomp($expected); warn "$outstr\n" if $DEBUGGING; die "$outstr ne $expected\n" unless $outstr eq $expected; $parser->dispose(); }; warn "$@\n" if $DEBUGGING; ok(!$@, "attribute-set in element"); eval { my $stylesheet =< underline black 14pt Foo EOS my $xml =<

    Some Random text

    EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $outstr = $parser->toString() ; my $expected =<Foo

    EOE chomp($expected); warn "$outstr\n" if $DEBUGGING; die "$outstr ne $expected\n" unless $outstr =~ /$expected/; $parser->dispose(); }; warn "$@\n" if $DEBUGGING; ok(!$@, "nested attribute-sets"); eval { my $stylesheet =< http://www.w3.org/1999/XSL/Transform value EOS my $xml = ''; my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $outstr = $parser->toString() ; warn "$outstr\n" if $DEBUGGING; die "$outstr contains xmlns declaration\n" if $outstr =~ /xmlns:xsl/ ; $parser->dispose(); }; warn "$@\n" if $DEBUGGING; ok(!$@, "do not output namespace declaration"); XML-XSLT-0.48/t/forward.t0100644000076500007650000000464207417005264015205 0ustar jonathanjonathan# Test forward compatibility # $Id: forward.t,v 1.2 2002/01/09 09:17:40 gellyfish Exp $ use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use Test::More tests => 8; use_ok('XML::XSLT'); my $stylesheet =< XSLT 17.0 required

    Sorry, this stylesheet requires XSLT 17.0.

    EOS my $parser; eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; warn $@ if $DEBUGGING; ok(! $@,'Forward compatibility as per 1.1 Working Draft'); my $xml = 'Test data'; my $outstr; eval { $parser->transform($xml); $outstr = $parser->toString(); die unless $outstr; }; warn $@ if $DEBUGGING; print $outstr if $DEBUGGING; ok(! $@, 'Check it can process this'); my $wanted =<XSLT 17.0 required

    Sorry, this stylesheet requires XSLT 17.0.

    EOW chomp($wanted); ok($outstr eq $wanted, 'Check it makes the right output'); $stylesheet =< Sorry, this stylesheet requires XSLT 17.0. EOS eval { $parser->dispose(); }; ok(!$@, 'dispose'); eval { $parser = XML::XSLT->new( \$stylesheet,debug => $DEBUGGING) || die; }; ok(! $@, 'Another forward compat test'); eval { $parser->transform($xml); $outstr = $parser->toString(); die unless $outstr; }; print $outstr if $DEBUGGING; ok(! $@, 'Transform this'); $wanted = 'Test data'; chomp($wanted); ok($outstr eq $wanted, 'Check it makes the right output'); XML-XSLT-0.48/t/call_template.t0100644000076500007650000000155307416542743016354 0ustar jonathanjonathan# $Id: call_template.t,v 1.1 2002/01/08 10:11:47 gellyfish Exp $ # Check call-template use strict; my $DEBUGGING = 0; use Test::More tests => 2; use_ok('XML::XSLT'); my $stylesheet =< doc found EOS my $xml = '
    '; eval { my $xslt = XML::XSLT->new($stylesheet, debug => $DEBUGGING); my $expected = 'doc found'; $xslt->transform(\$xml); my $outstr = $xslt->toString(); $outstr =~ s/^\s+//; $outstr =~ s/\s+$//; warn "$outstr\n" if $DEBUGGING; $xslt->dispose(); die "$outstr ne $expected\n" unless $outstr eq $expected; }; ok(!$@,'Call template'); XML-XSLT-0.48/t/pattern.t0100644000076500007650000000044610014115600015173 0ustar jonathanjonathan#!/usr/nin/perl # Test all patterns # $Id: pattern.t,v 1.1 2004/02/16 10:29:20 gellyfish Exp $ use strict; use Test::More tests => 2; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $parser = XML::XSLT->new(use_deprecated => 1,debug => $DEBUGGING); }; ok(1,""); XML-XSLT-0.48/t/doe.t0100644000076500007650000000214607407353670014313 0ustar jonathanjonathan# $Id: doe.t,v 1.2 2001/12/17 11:32:08 gellyfish Exp $ # check disable-output-escaping && the interface use Test::More tests => 7; use strict; use_ok('XML::XSLT'); my $parser = eval { my $stylesheet =< <&<& EOS XML::XSLT->new($stylesheet,warnings => 'Active'); }; ok(! "$@","new from stylsheet text"); ok($parser,"new successful"); eval { $parser->transform(\<

    <&

    EOX }; ok(!"$@","transform xml"); my $outstr= eval { $parser->toString }; ok(!$@,"toString works"); ok($outstr,"Output is expected"); my $correct='<&<&<&<&'; ok($outstr eq $correct,"Output is what we expected it to be"); XML-XSLT-0.48/t/spec_examples.t0100644000076500007650000001210607417005264016363 0ustar jonathanjonathan# The examples from the 1.1 Working Draft # $Id: spec_examples.t,v 1.3 2002/01/09 09:17:40 gellyfish Exp $ use Test::More tests => 8; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); # First example my $stylesheet =< <xsl:value-of select="title"/>

    NOTE:

    EOS my $xml =< Document Title Chapter Title
    Section Title This is a test. This is a note.
    Another Section Title This is another test. This is another note.
    EOX # this is not the same as that in the spec because of white space issues my $expected =< Document Title

    Document Title

    Chapter Title

    Section Title

    This is a test.

    NOTE: This is a note.

    Another Section Title

    This is another test.

    NOTE: This is another note.

    EOE chomp($expected); my $parser; eval { $parser = XML::XSLT->new($stylesheet,debug => $DEBUGGING); die unless $parser; }; warn $@ if $DEBUGGING; ok(!$@,'Can parse example stylesheet'); my $outstr; eval { $outstr = $parser->serve(\$xml,http_headers => 0); die "no output" unless $outstr; }; warn $@ if $DEBUGGING; ok(!$@,'serve produced output'); warn $outstr if $DEBUGGING; ok($outstr eq $expected,'Matches output'); $parser->dispose(); # The data example - test 'Literal result as stylesheet' $xml =< 10 9 7 4 3 4 6 -1.5 2 EOX $stylesheet =<<'EOS'; Sales Results By Division
    Division Revenue Growth Bonus
    color:red
    EOS $expected =< Sales Results By Division
    DivisionRevenueGrowthBonus
    North1097
    West6-1.52
    South434
    EOE eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; ok(!$@,'Wahay it can parse literal result'); eval { $outstr = $parser->serve(\$xml,http_headers => 0); die unless $outstr; }; ok(!$@,'serve at least did something'); ok($outstr !~ 'xsl:sort', 'xsl:sort has not reappeared'); SKIP: { skip("Doesn't handle xsl:sort properly",1); ok( $outstr eq $expected,'Great it does Literal stylesheets'); } print $outstr if $DEBUGGING; XML-XSLT-0.48/t/params.t0100644000076500007650000000200607407353671015023 0ustar jonathanjonathan# $Id: params.t,v 1.2 2001/12/17 11:32:09 gellyfish Exp $ # check params && the interface use Test::More tests => 7; use strict; use_ok('XML::XSLT'); my $parser = eval { XML::XSLT->new (<<'EOS', warnings => 'Active'); value1 undefined[ param1= ] EOS }; ok(! $@,"New from literal stylesheet"); ok($parser,"Parser is defined"); eval { $parser->transform(\< EOX }; ok(! $@,"transform from on literal XML"); my $outstr= eval { $parser->toString }; ok(! $@, "toString works"); ok($outstr,"toString created output"); my $correct='[ param1=value1 ][ param1=undefined ]'; ok( $correct eq $outstr,"Output is as expected"); XML-XSLT-0.48/t/variable.t0100644000076500007650000000742510015233010015304 0ustar jonathanjonathan#!/usr/bin/perl -w # Test for correct operation of variables # $Id: variable.t,v 1.1 2004/02/16 10:29:20 gellyfish Exp $ use Test::More tests => 14; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); # Test literal value in select my $stylesheet =<<'EOS'; EOS my $xml =< EOX my $correct = '*This is a test*'; my $parser; eval { $parser = XML::XSLT->new($stylesheet, debug => $DEBUGGING ); die unless $parser; }; warn $@ if $DEBUGGING; ok(!$@,"new from literal stylesheet"); eval { $parser->transform(\$xml); }; warn $@ if $DEBUGGING; ok(! $@, "transform" ); my $outstr; eval { $outstr = $parser->toString(); die unless $outstr; }; warn $outstr if $DEBUGGING; warn $@ if $DEBUGGING; ok(!$@,"toString works"); ok($outstr eq $correct,"Output meets expectations - with toString"); $stylesheet =<<'EOS'; *This is a test* EOS eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; ok(!$@, 'Can parse template value as variable'); eval { $parser->transform(\$xml); }; ok(!$@, 'transform'); eval { $outstr = $parser->toString(); die unless $outstr; }; ok(!$@,'got some output'); ok($outstr eq $correct,'Got expected output'); $stylesheet =<<'EOS'; EOS $xml =< EOX eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; ok(!$@, 'Can parse template'); eval { $parser->transform(\$xml); }; ok(!$@, 'transform'); eval { $outstr = $parser->toString(); die unless $outstr; }; ok(!$@,'got some output'); ok($outstr eq $correct,'Got expected output'); eval { $stylesheet =<<'EOS';

    param1 =

    param2 =

    EOS $xml =< EOX $parser = XML::XSLT->new($stylesheet, debug => $DEBUGGING, variables => { param1 => "One", param2 => "Two" }); $parser->transform(\$xml); $outstr = $parser->toString(); $correct = '

    param1 = One

    param2 = Two

    '; die "$outstr ne $correct" unless $outstr eq $correct; }; ok(!$@,"external variables work as expected"); XML-XSLT-0.48/t/cdata_sect.t0100644000076500007650000000367307420523607015636 0ustar jonathanjonathan# Test that cdata-section elements work # $Id: cdata_sect.t,v 1.4 2002/01/14 09:40:23 gellyfish Exp $ use Test::More tests => 7; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); # First example my $stylesheet =< <foo> EOS my $xml = ''; # this is not the same as that in the spec because of white space issues my $expected =< ]]> EOE chomp($expected); my $parser; eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; warn $@ if $DEBUGGING; ok(!$@,'Can parse example stylesheet'); my $outstr; eval { $outstr = $parser->serve(\$xml,http_headers => 0); die "no output" unless $outstr; }; warn $@ if $DEBUGGING; ok(!$@,'serve produced output'); warn $outstr if $DEBUGGING; ok($outstr eq $expected,'Matches output'); $parser->dispose(); # The data example - test 'Literal result as stylesheet' $stylesheet =<<'EOS'; ]]> EOS $expected =< ]]> EOE chomp ($expected); eval { $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); die unless $parser; }; ok(!$@,'Wahay it can parse literal result'); eval { $outstr = $parser->serve(\$xml,http_headers => 0); die unless $outstr; }; ok(!$@,'serve at least did something'); ok( $outstr eq $expected,'Preserves CDATA'); print $outstr if $DEBUGGING; XML-XSLT-0.48/t/output.t0100644000076500007650000000315707417005264015101 0ustar jonathanjonathan# Test for 'output' (which is hopefully fixed) # $Id: output.t,v 1.3 2002/01/09 09:17:40 gellyfish Exp $ use Test::More tests => 7; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); my $stylesheet = < EOS my $xml =< This is a test EOX my $parser; eval { $parser = XML::XSLT->new($stylesheet, debug => $DEBUGGING); die unless $parser; }; warn $@ if $DEBUGGING; ok (!$@,"new from literal stylesheet"); eval { $parser->transform(\$xml); }; warn $@ if $DEBUGGING; ok(! $@, "transform"); my $correct = "This is a test"; my $outstr; warn $outstr if $DEBUGGING; eval { $outstr = $parser->toString(); die unless $outstr; }; warn $@ if $DEBUGGING; ok(!$@,"toString works"); ok($outstr eq $correct,"Output meets expectations - with toString"); $correct =< This is a test EOC chomp($correct); eval { $outstr = $parser->serve(\$xml,http_headers => 0); die unless $outstr; }; warn $outstr if $DEBUGGING; ok(!$@,"serve(), works"); ok($outstr eq $correct,"Output meets expectations with declarations"); XML-XSLT-0.48/t/pi.t0100644000076500007650000000152210014115600014122 0ustar jonathanjonathan#!/usr/nin/perl # Test that xsl:processing-instruction works # $Id: pi.t,v 1.1 2004/02/16 10:29:20 gellyfish Exp $ use strict; use Test::More tests => 2; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); eval { my $stylesheet =< bar="foo" EOS my $xml =< Foo EOX my $parser = XML::XSLT->new(\$stylesheet,debug => $DEBUGGING); $parser->transform(\$xml); my $wanted = ''; my $outstr = $parser->toString; die "$outstr ne $wanted\n" unless $outstr eq $wanted; }; ok(!$@,"processing instruction text as expected"); XML-XSLT-0.48/t/text_nodes.t0100644000076500007650000000300207417005264015702 0ustar jonathanjonathan# Test xsl:text # $Id: text_nodes.t,v 1.1 2002/01/09 09:17:40 gellyfish Exp $ use strict; my $DEBUGGING = 0; use Test::More tests => 2; use_ok('XML::XSLT'); # xsl:sort is still broken but I am ignoring that my $stylesheet =<
  • EOS my $xml =< James Clark Daniel Veillard Michael Kay EOX my $expected =<
  • James Clark
  • Daniel Veillard
  • Michael Kay
  • EOE chomp($expected); eval { my $xslt = XML::XSLT->new($stylesheet, debug => $DEBUGGING); $xslt->transform(\$xml); my $outstr = $xslt->toString(); warn "$outstr\n" if $DEBUGGING; $xslt->dispose(); die "$outstr ne $expected\n" unless $outstr eq $expected; }; print $@ if $DEBUGGING; ok(!$@,'text node preserved'); XML-XSLT-0.48/t/features.t0100644000076500007650000000127607407353671015366 0ustar jonathanjonathan# Test miscellaneous features # $Id: features.t,v 1.1 2001/12/17 11:32:09 gellyfish Exp $ use strict; use Test::More tests => 3; use_ok('XML::XSLT'); my $sheet =< Prepare to die! EOS my $parser; eval { $parser = XML::XSLT->new(\$sheet); die unless $parser; }; ok(! $@, "Testing parse of and "); my $xml = 'foo'; SKIP: { skip("Message not implemented",1); eval { $parser->transform($xml); }; ok($@,"Message"); } XML-XSLT-0.48/t/xsl_cond.t0100644000076500007650000004565210013760171015347 0ustar jonathanjonathan# $Id: xsl_cond.t,v 1.3 2001/12/17 11:32:09 gellyfish Exp $ # check test attributes && the interface use Test::More tests => 25; use strict; use vars qw($DEBUGGING); $DEBUGGING = 0; use_ok('XML::XSLT'); # element tests eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    foosome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node string eq"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    barsome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; ok(!$@,"text node string ne"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    asome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node string lt"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    csome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node string gt"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    csome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node string ge"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    bsome random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node string le"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    42some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; ok(!$@,"text node numeric eq"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    43some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node numeric ne"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    41some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node numeric lt"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    43some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node numeric gt"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    43some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node numeric ge"); eval { my $parser = XML::XSLT->new (< $DEBUGGING); o not ok k EOS $parser->transform(\<

    41some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"text node numeric le"); # attribute tests eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute string eq"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; ok(!$@,"attribute string ne"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute string lt"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute string gt"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute string ge"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute string le"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; ok(!$@,"attribute numeric eq"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute numeric ne"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute numeric lt"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute numeric gt"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute numeric ge"); eval { my $parser = XML::XSLT->new (<<'EOS', debug => $DEBUGGING); o not ok k EOS $parser->transform(\<

    some random text

    EOX my $outstr = $parser->toString(); warn $outstr if $DEBUGGING; my $correct = 'ok'; $parser->dispose(); die "$outstr ne $correct\n" unless $outstr eq $correct; }; warn $@ if $DEBUGGING; ok(!$@,"attribute numeric le"); XML-XSLT-0.48/t/zeroes.t0100644000076500007650000000214707407353671015055 0ustar jonathanjonathan# $Id: zeroes.t,v 1.3 2001/12/17 11:32:09 gellyfish Exp $ # check the ``0'' bug && the interface use Test::More tests => 7; use strict; use_ok('XML::XSLT'); my $parser = eval { XML::XSLT->new (<<'EOS', warnings => 'Active'); 0 0 EOS }; ok(! $@,"New from literal stylesheet"); ok($parser,"Parser is a defined value"); eval { $parser->transform(\<

    00 EOX }; ok(!$@,"transform a literal XML document"); my $outstr= eval { $parser->toString }; ok(! $@, "toString doesn't die"); ok($outstr,"toString produced output"); my $correct='0000'; ok( $correct eq $outstr,"The expected output was produced"); XML-XSLT-0.48/ChangeLog0100644000076500007650000016411210015073020014641 0ustar jonathanjonathan 2004-02-18 Wednesday 2004-02-18T08:34:38Z gellyfish MANIFEST Exp 1.17 lib/XML/XSLT.pm Exp 1.24 t/select.t Exp 1.1 * Fixed select on "comment()" "processing-instruction()" etc * Added test for select 2004-02-17 Tuesday 2004-02-17T10:44:29Z gellyfish t/attributes.t Exp 1.4 * More attribute tests 2004-02-17 Tuesday 2004-02-17T10:06:04Z gellyfish MANIFEST Exp 1.16 lib/XML/XSLT.pm Exp 1.23 t/comment.t Exp 1.2 t/copy.t Exp 1.1 * Added test for xsl:copy 2004-02-17 Tuesday 2004-02-17T08:52:29Z gellyfish ChangeLog Exp 1.11 META.yml Exp 1.2 lib/XML/XSLT.pm Exp 1.22 * 'use-attribute-sets' works in xsl:copy and recursively 2004-02-16 Monday 2004-02-16T10:29:20Z gellyfish MANIFEST Exp 1.15 META.yml Exp 1.1 lib/XML/XSLT.pm Exp 1.21 t/pattern.t Exp 1.1 t/pi.t Exp 1.1 t/variable.t Exp 1.1 * Fixed variable implementation to handle non literals * refactored test implementation * added tests 2003-06-24 Tuesday 2003-06-24T16:34:51Z gellyfish lib/XML/XSLT.pm Exp 1.20 * Allowed both name and match attributes in templates * Lost redefinition warning with perl 5.8 2002-02-18 Monday 2002-02-18T22:31:59Z gellyfish ChangeLog Exp 1.10 examples/ChangeLog.xsl Exp 1.1 Added ChangeLog.xsl as an example to transform the output of cvs2pl 2002-02-18 Monday 2002-02-18T09:05:14Z gellyfish lib/XML/XSLT.pm Exp 1.19 Refactoring 2002-01-16 Wednesday 2002-01-16T21:05:26Z gellyfish MANIFEST Exp 1.14 examples/XSLT.html Exp 1.2 examples/XSLT.xml Exp 1.2 lib/XML/XSLT.pm Exp 1.18 * Added the manpage as an example * Started to properly implement omit-xml-declaration 2002-01-14 Monday 2002-01-14T09:40:22Z gellyfish MANIFEST Exp 1.13 examples/XSLT.html Exp 1.1 examples/XSLT.xml Exp 1.1 examples/XSLT.xsl Exp 1.1 t/cdata_sect.t Exp 1.4 * Added the modules own documentation as an example 2002-01-13 Sunday 2002-01-13T10:35:00Z gellyfish lib/XML/XSLT.pm Exp 1.17 Updated pod 2002-01-09 Wednesday 2002-01-09T09:17:40Z gellyfish lib/XML/XSLT.pm Exp 1.16 t/attributes.t Exp 1.3 t/cdata_sect.t Exp 1.3 t/forward.t Exp 1.2 t/output.t Exp 1.3 t/spec_examples.t Exp 1.3 t/text_nodes.t Exp 1.1 * added test for <xsl:text> * Stylesheet whitespace stripping as per spec and altered tests ... 2002-01-08 Tuesday 2002-01-08T10:16:54Z gellyfish t/cdata_sect.t Exp 1.2 foo'd up the t/cdata_sect.t 2002-01-08 Tuesday 2002-01-08T10:11:46Z gellyfish MANIFEST Exp 1.12 lib/XML/XSLT.pm Exp 1.15 t/call_template.t Exp 1.1 t/cdata_sect.t Exp 1.1 t/spec_examples.t Exp 1.2 * First cut at cdata-section-element * test for above 2001-12-24 Monday 2001-12-24T16:00:19Z gellyfish ChangeLog Exp 1.9 VERSION-040 README Exp 1.7 VERSION-040 lib/XML/XSLT.pm Exp 1.14 VERSION-040 * Version released to CPAN 2001-12-20 Thursday 2001-12-20T09:21:42Z gellyfish lib/XML/XSLT.pm Exp 1.13 More refactoring 2001-12-19 Wednesday 2001-12-19T21:06:31Z gellyfish lib/XML/XSLT.pm Exp 1.12 t/output.t Exp 1.2 VERSION-040 * Some refactoring and style changes 2001-12-19 Wednesday 2001-12-19T09:11:14Z gellyfish lib/XML/XSLT.pm Exp 1.11 * Added more accessors for object attributes * Fixed potentially broken usage of $variables in _evaluate_template 2001-12-18 Tuesday 2001-12-18T09:10:10Z gellyfish README Exp 1.6 lib/XML/XSLT.pm Exp 1.10 t/attributes.t Exp 1.2 VERSION-040 Implemented attribute-sets 2001-12-17 Monday 2001-12-17T22:32:11Z gellyfish Makefile.PL Exp 1.7 VERSION-040 lib/XML/XSLT.pm Exp 1.9 * Added Test::More to Makefile.PL * Added _indent and _outdent methods * Placed __get_attribute_sets in transform() 2001-12-17 Monday 2001-12-17T11:32:08Z gellyfish MANIFEST Exp 1.11 VERSION-040 MANIFEST.SKIP Exp 1.1 VERSION-040 lib/XML/XSLT.pm Exp 1.8 t/attributes.t Exp 1.1 t/comment.t Exp 1.1 VERSION-040 t/doe.t Exp 1.2 VERSION-040 t/features.t Exp 1.1 VERSION-040 t/forward.t Exp 1.1 VERSION-040 t/output.t Exp 1.1 t/params.t Exp 1.2 VERSION-040 t/spec_examples.t Exp 1.1 VERSION-040 t/xsl_cond.t Exp 1.3 VERSION-040 t/zeroes.t Exp 1.3 VERSION-040 * Rolled in various patches * Added new tests 2001-04-06 Friday 2001-04-06T02:29:05Z hexmode lib/XML/XSLT.pm Exp 1.7 must ... increment ... version 2001-04-06 Friday 2001-04-06T02:26:54Z hexmode lib/XML/XSLT.pm Exp 1.6 asString all over the place, but only toString existed. 2001-03-01 Thursday 2001-03-01T05:22:45Z hexmode README Exp 1.5 xslt-parser Exp 1.9 VERSION-040 lib/XML/XSLT.pm Exp 1.5 match CPAN 2001-01-23 Tuesday 2001-01-23T04:09:47Z hexmode ChangeLog Exp 1.8 [no log message] 2001-01-23 Tuesday 2001-01-23T04:03:02Z hexmode lib/XML/XSLT.pm Exp 1.4 Document support for xsl:output. Document support with xsl:call-template. Better code to find the first /Element/ of a stylesheet rather than the first Child of the XML::DOM object Fix up xsl:output handling of HTML and XML types 2001-01-23 Tuesday 2001-01-23T03:49:18Z hexmode xslt-parser Exp 1.8 Better handling of files when specifying an extention. 2001-01-16 Tuesday 2001-01-16T03:25:07Z hexmode lib/XML/XSLT.pm Exp 1.3 Version Bump 2001-01-16 Tuesday 2001-01-16T03:24:13Z hexmode ChangeLog Exp 1.7 [no log message] 2001-01-16 Tuesday 2001-01-16T03:17:35Z hexmode lib/XML/XSLT.pm Exp 1.2 Package Seperation 2001-01-16 Tuesday 2001-01-16T03:14:14Z hexmode MANIFEST Exp 1.10 Makefile.PL Exp 1.6 XSLT.pm dead 1.25 lib/XML/XSLT.pm Exp 1.1 t/xsl_cond.t Exp 1.2 Moving File for CPAN compatiblity 2001-01-06 Saturday 2001-01-06T07:18:18Z hexmode XSLT.pm Exp 1.24 Oops! Integrated lost changes 2001-01-06 Saturday 2001-01-06T04:00:18Z hexmode XSLT.pm Exp 1.23 Cleanup Arrayref -> hashref 2000-10-30 Monday 2000-10-30T22:06:15Z nejedly XSLT.pm Exp 1.22 t/params.t Exp 1.1 fixed problems with variables 2000-09-23 Saturday 2000-09-23T10:13:15Z nejedly XSLT.pm Exp 1.21 fixed test="child=''" 2000-09-23 Saturday 2000-09-23T09:45:23Z nejedly XSLT.pm Exp 1.20 t/xsl_cond.t Exp 1.1 fixed problem with <xsl:if test='children="something"'> 2000-09-20 Wednesday 2000-09-20T17:58:17Z nejedly XSLT.pm Exp 1.19 t/doe.t Exp 1.1 Added support for disable-output-escaping + test script 2000-08-10 Thursday 2000-08-10T14:29:19Z nejedly t/zeroes.t Exp 1.2 one more detail :) 2000-08-10 Thursday 2000-08-10T13:55:41Z nejedly t/00-load.t dead 1.2 no longer needed. 2000-08-10 Thursday 2000-08-10T13:38:00Z nejedly MANIFEST Exp 1.9 XSLT.pm Exp 1.18 fixed __add_default_templates. fixed ``zero bug'' changed MANIFEST (added t/zeroes.t and removed t/00-load.t as it's no longer needed) 2000-08-10 Thursday 2000-08-10T13:33:59Z nejedly t/zeroes.t Exp 1.1 added test script for the ``zero bug'' 2000-08-09 Wednesday 2000-08-09T03:53:26Z mike808 XSLT.pm Exp 1.8.6.11 cpan-0_2x-maint bugfixes, compiles, but needs verification 2000-08-07 Monday 2000-08-07T06:55:25Z mike808 XSLT.pm Exp 1.8.6.10 cpan-0_2x-maint optimize copy-on-add to direct assignment, fix match with \n issues, compiles 2000-08-07 Monday 2000-08-07T02:54:54Z mike808 XSLT/Tag_Compression.pm dead 1.1.2.2 cpan-0_2x-maint moved to XML/XSLT 2000-08-07 Monday 2000-08-07T02:39:18Z mike808 XML/XSLT/Tag_Compression.pm Exp 1.1.2.1 cpan-0_2x-maint Moved from XSLT directory. 2000-07-31 Monday 2000-07-31T02:07:55Z hexmode XSLT.pm Exp 1.17 make REs case sensisitive. replace line endings with \n 2000-07-30 Sunday 2000-07-30T20:12:17Z mike808 Makefile.PL Exp 1.1.2.2 cpan-0_2x-maint Add CVS markers, clean up 2000-07-30 Sunday 2000-07-30T19:49:38Z mike808 XSLT.pm Exp 1.8.6.9 cpan-0_2x-maint Apply Marc Lehman's patch. 2000-07-30 Sunday 2000-07-30T19:47:00Z mike808 CREDITS.pod Exp 1.1.2.4 cpan-0_2x-maint Added Marc Lehman to contributors list 2000-07-29 Saturday 2000-07-29T19:30:17Z hexmode XSLT.pm Exp 1.16 incorporate Pavel Nejedly's fixes to __string__ for CDATA 2000-07-27 Thursday 2000-07-27T22:16:55Z hexmode xslt-parser Exp 1.7 option for HTML::Clean 2000-07-27 Thursday 2000-07-27T22:15:49Z hexmode XSLT.pm Exp 1.15 Option for serve to use HTML::Clean restoration of __get_stylesheet removed buglets on default namespace extraction addition of <xml:element> removal of regexps in _evaluate_element 2000-07-27 Thursday 2000-07-27T07:19:02Z hexmode Makefile.PL Exp 1.5 Fixing executable 2000-07-27 Thursday 2000-07-27T07:17:54Z hexmode XSLT.pm Exp 1.14 Fixing __open_by_filename 2000-07-27 Thursday 2000-07-27T05:42:53Z mike808 CREDITS.pod Exp 1.1.2.3 cpan-0_2x-maint somebody might run this as a perl script. Added END tag 2000-07-24 Monday 2000-07-24T20:03:09Z hexmode CREDITS.pod Exp 1.1.2.2 cpan-0_2x-maint My blurb 2000-07-23 Sunday 2000-07-23T04:27:25Z mike808 XSLT.pm Exp 1.8.6.8 cpan-0_2x-maint Merge Bron's OO debug interface. Defined methos only. Not in use yet. 2000-07-23 Sunday 2000-07-23T03:46:25Z mike808 XSLT/Tag_Compression.pm Exp 1.1.2.1 cpan-0_2x-maint LICENSE Exp 1.1.2.1 cpan-0_2x-maint CREDITS.pod Exp 1.1.2.1 cpan-0_2x-maint XSLT.pm Exp 1.8.6.7 cpan-0_2x-maint formalized code style, pods, and structure 2000-07-22 Saturday 2000-07-22T22:34:23Z mike808 XSLT.pm Exp 1.8.6.6 cpan-0_2x-maint Major style overhaul, improve coding consistency. 1.8.6.5 merged. 2000-07-18 Tuesday 2000-07-18T04:59:17Z hexmode XSLT.pm Exp 1.8.6.5 cpan-0_2x-maint typos 2000-07-14 Friday 2000-07-14T01:26:42Z hexmode XSLT.pm Exp 1.13 Incorporate bug fixes for 0.24 and Mike's docs 2000-07-14 Friday 2000-07-14T01:13:10Z hexmode XSLT.pm Exp 1.12 proper base handling and the completed serve 2000-07-14 Friday 2000-07-13T23:18:07Z hexmode XSLT.pm Exp 1.11 Moved from LWP::UserAgent to LWP::Simple* Added URI::Heuristic* Change constant names for move to XPATH. * These moves should be reconsidered, but they are relatively minor, so I'll leave them in. 2000-07-14 Friday 2000-07-13T23:08:48Z hexmode XSLT.pm Exp 1.10 Move from hashref to arrayref. Use already-exported constants from XML::DOM instead of assigning them to scalars. 2000-07-14 Friday 2000-07-13T23:03:36Z hexmode XSLT.pm Exp 1.9 Documentation changes. Start of deprecation handling. $parser => $Self 2000-07-13 Thursday 2000-07-13T22:54:17Z hexmode xslt-parser Exp 1.6 Completed move to 0.30 API 2000-07-13 Thursday 2000-07-13T22:53:27Z hexmode xslt-parser Exp 1.5 Added a pod and fixed a small bug 2000-07-13 Thursday 2000-07-13T22:51:22Z hexmode xslt-parser Exp 1.4 Cleaned up version using Getopt::Std 2000-07-13 Thursday 2000-07-13T22:48:46Z hexmode MANIFEST Exp 1.8 Makefile.PL Exp 1.4 README Exp 1.4 Bringing up my 0.30 2000-07-13 Thursday 2000-07-13T22:33:46Z hexmode ChangeLog Exp 1.6 Bringing in Bron's entries for his bug fixes 2000-07-13 Thursday 2000-07-13T21:28:28Z hexmode website/changelog.html dead 1.2 website/index.html dead 1.8 website/ main branch update to reality 2000-07-13 Thursday 2000-07-13T21:09:11Z hexmode XSLT.pm Exp 1.8.6.4 cpan-0_2x-maint Mike's cleanup of XSLT.pm 2000-07-13 Thursday 2000-07-13T20:08:25Z hexmode ChangeLog Exp 1.1.4.2 cpan-0_2x-maint latest-cpan MANIFEST Exp 1.4.4.2 cpan-0_2x-maint latest-cpan Make the rest of the files here match 0.24 2000-07-13 Thursday 2000-07-13T20:05:38Z hexmode XSLT.pm Exp 1.8.6.3 cpan-0_2x-maint latest-cpan Bron's compile fixes for 0.24 2000-07-13 Thursday 2000-07-13T20:03:56Z hexmode XSLT.pm Exp 1.8.6.2 cpan-0_2x-maint Bron's changes for 0.24 2000-07-10 Monday 2000-07-10T14:09:57Z brong MANIFEST Exp 1.7 update for missing tests 2000-07-10 Monday 2000-07-10T14:09:29Z brong t/01-simple.t dead 1.9 t/02-grammer.t dead 1.2 t/ remove broken tests 2000-07-10 Monday 2000-07-10T13:58:34Z brong ChangeLog Exp 1.5 Explain all about what I've done for 0.24 2000-07-10 Monday 2000-07-10T13:35:34Z brong xslt-parser Exp 1.3 examples/REC-xslt-19991116.xml Exp 1.3 VERSION-040 examples/xmlspec.xsl Exp 1.3 VERSION-040 Bringing the repository back into line with the CPAN releases 2000-07-10 Monday 2000-07-10T13:27:00Z brong ChangeLog Exp 1.4 Back to 0.23 2000-07-10 Monday 2000-07-10T13:23:52Z brong Makefile.PL Exp 1.3 Back to 0.23 2000-07-10 Monday 2000-07-10T01:21:16Z hexmode ChangeLog Exp 1.3 MANIFEST Exp 1.6 Makefile.PL Exp 1.2 xslt-parser Exp 1.2 examples/REC-xslt-19991116.xml Exp 1.2 examples/xmlspec.xsl Exp 1.2 Merge changes I made on cpan-0_23 branch 2000-06-29 Thursday 2000-06-29T16:32:15Z hexmode XSLT.pm Exp 1.8.6.1 cpan-0_2x-maint synced 2000-06-29 Thursday 2000-06-29T16:27:51Z hexmode ChangeLog Exp 1.1.4.1 cpan-0_2x-maint MANIFEST Exp 1.4.4.1 cpan-0_2x-maint Makefile.PL Exp 1.1.2.1 cpan-0_2x-maint latest-cpan README Exp 1.3.4.1 cpan-0_2x-maint latest-cpan xslt-parser Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/91-22-5.cml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/91-22-5.cml.xml Exp 1.1.4.1 cpan-0_2x-maint latest-cpan examples/91-22-5.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/95-48-7.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/95-48-7.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/REC-xml-19980210.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/REC-xslt-19991116.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/agenda.dtd Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/agenda.html Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/agenda.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/agenda.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/bernhard.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/bernhard.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/cml.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/cml2cml.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/grammar.xml Exp 1.1.4.1 cpan-0_2x-maint latest-cpan examples/grammar.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/grammar2.xml Exp 1.1.4.1 cpan-0_2x-maint latest-cpan examples/grammar2.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/identity.xml Exp 1.1.4.1 cpan-0_2x-maint latest-cpan examples/identity.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/identity.xsl_org Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/test.dtd Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/test.xml Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/test.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/test2.xml Exp 1.1.4.1 cpan-0_2x-maint latest-cpan examples/test2.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/xmlspec.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/xpath.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan examples/xslt.xsl Exp 1.1.2.1 cpan-0_2x-maint latest-cpan t/00-load.t Exp 1.1.4.1 cpan-0_2x-maint latest-cpan cpan release sync 2000-06-28 Wednesday 2000-06-28T06:08:08Z brong ChangeLog Exp 1.2 MANIFEST Exp 1.5 Bugpatch to method dispose() in XSLT.pm Removed 01-simple.t from the MANIFEST because it breaks on make test due to incompleteness Updated changelog with details. 2000-06-22 Thursday 2000-06-22T17:32:18Z hexmode website/index.html Exp 1.7.2.2 BRON1 Update mailing list info. 2000-06-22 Thursday 2000-06-22T17:28:09Z hexmode website/index.html Exp 1.7.2.1 BRON1 Update mailing list Links 2000-06-22 Thursday 2000-06-22T09:31:57Z hexmode XSLT/process_stylesheet.pm Exp 1.1.2.2 BRON1 Simplified regexp, fixed bug w/ 'stylesheet', outlined idea for namespace resolution 2000-06-21 Wednesday 2000-06-21T15:59:10Z hexmode XSLT.pm Exp 1.8.4.1 BRON1 XSLT/process_stylesheet.pm Exp 1.1.2.1 BRON1 Bron's Changes 2000-06-21 Wednesday 2000-06-21T15:58:15Z hexmode t/01-simple.t Exp 1.8 cpan-0_24 [no log message] 2000-06-21 Wednesday 2000-06-21T15:54:33Z hexmode t/01-simple.t Exp 1.5 t/01-simple.t Exp 1.6 t/01-simple.t Exp 1.7 t/ [no log message] 2000-06-21 Wednesday 2000-06-21T15:53:45Z hexmode t/01-simple.t Exp 1.4 Bron's Changes 2000-06-21 Wednesday 2000-06-21T03:41:18Z hexmode XSLT.pm Exp 1.7.2.1 Updated webpage info 2000-06-21 Wednesday 2000-06-21T03:22:14Z hexmode t/01-simple.t Exp 1.3 cpan-0_30 Set up test skeleton. 2000-06-20 Tuesday 2000-06-20T20:05:58Z hexmode website/index.html Exp 1.7 cpan-0_24 Making the webpage valid XHTML 2000-06-20 Tuesday 2000-06-20T20:03:35Z hexmode website/index.html Exp 1.6 update 2000-06-20 Tuesday 2000-06-20T19:22:15Z hexmode website/index.html Exp 1.5 Test 2000-06-20 Tuesday 2000-06-20T19:13:45Z hexmode website/index.html Exp 1.4 Testing web update. 2000-06-20 Tuesday 2000-06-20T01:18:58Z hexmode website/index.html Exp 1.3 Remove Listbot 2000-06-19 Monday 2000-06-19T06:31:02Z egonw website/index.html Exp 1.2 Added SourceForge site. (Do we officially need a banner? Or just a link?) Removed ListBot maillist code. Changed Bug Report link to SF. 2000-06-19 Monday 2000-06-19T03:43:37Z hexmode README Exp 1.3 Obligatory note about interface changes. 2000-06-19 Monday 2000-06-19T03:27:37Z hexmode README Exp 1.2 Cleaned up... Changed some pointers. 2000-06-19 Monday 2000-06-19T03:05:23Z hexmode XSLT.pm Exp 1.8 Geert's Updates 2000-06-19 Monday 2000-06-19T00:43:36Z hexmode t/01-simple.t Exp 1.2 cut the warn 2000-06-16 Friday 2000-06-16T21:57:15Z hexmode t/02-grammer.t Exp 1.1 cpan-0_24 cpan-0_30 Yet another test 2000-06-16 Friday 2000-06-16T21:36:54Z hexmode ChangeLog Exp 1.1 MANIFEST Exp 1.4 changelog.html dead 1.2 index.txt dead 1.2 website/changelog.html Exp 1.1 cpan-0_24 Moving website to CVS 2000-06-16 Friday 2000-06-16T21:28:54Z hexmode XSLT.pm Exp 1.7 Removed un-needed BEGIN section. 2000-06-16 Friday 2000-06-16T19:10:15Z hexmode MANIFEST Exp 1.3 index.html dead 1.2 website/index.html Exp 1.1 Creating subdir for website 2000-06-16 Friday 2000-06-16T18:51:15Z hexmode MANIFEST Exp 1.2 t/01-simple.t Exp 1.1 More Tests 2000-06-15 Thursday 2000-06-15T16:59:17Z hexmode t/00-load.t Exp 1.1 First simple test. 2000-06-15 Thursday 2000-06-15T05:08:04Z brong XSLT.pm Exp 1.6 Added timing code to debug mode (requires Time::HiRes, but degrades gracefully without it) 2000-06-14 Wednesday 2000-06-14T16:17:00Z brong XSLT.pm Exp 1.5 Updated the version number to 0.22 to reflect changes 2000-06-14 Wednesday 2000-06-14T16:13:05Z brong XSLT.pm Exp 1.4 Fix <xsl:value-of select="@attr"> where @attr has value "0" giving "" rather than "0". 2000-06-09 Friday 2000-06-09T06:54:12Z brong examples/identity.xml Exp 1.1 VERSION-040 examples/test2.xml Exp 1.1 VERSION-040 examples/ Added all the files that were missing in the repository before (at least I will have if it can handle these symlinks) - ready to allow a clean "make dist" and branch off experimental XPath code. 2000-06-09 Friday 2000-06-09T06:36:44Z brong examples/grammar2.xml Exp 1.1 VERSION-040 Added all the files that were missing in the repository before (at least I will have if it can handle these symlinks) - ready to allow a clean "make dist" and branch off experimental XPath code. 2000-06-09 Friday 2000-06-09T06:33:33Z brong examples/91-22-5.cml.xml Exp 1.1 VERSION-040 examples/grammar.xml Exp 1.1 VERSION-040 examples/ Added all the files that were missing in the repository before (at least I will have if it can handle these symlinks) - ready to allow a clean "make dist" and branch off experimental XPath code. 2000-06-06 Tuesday 2000-06-06T09:57:50Z brong XSLT.pm Exp 1.3 Changed @EXPORT to @EXPORT_OK in 'usr vars' to fix error under strict vars. Removed &dispose from @EXPORT_OK because it shouldn't ever be exported directly. Added a $Id: ChangeLog,v 1.12 2004/02/19 08:38:40 gellyfish Exp $ to the comment part at the top to track revisions. 2000-06-01 Thursday 2000-06-01T02:35:41Z hexmode MANIFEST Exp 1.1 XSLT.pm Exp 1.2 Need to know what ships with the package. 2000-06-01 Thursday 2000-06-01T02:22:03Z hexmode xslt-parser Exp 1.1.1.1 rel-0_21 Makefile.PL Exp 1.1.1.1 rel-0_21 README Exp 1.1.1.1 rel-0_21 XSLT.pm Exp 1.1.1.1 rel-0_21 index.txt Exp 1.1.1.1 rel-0_21 changelog.html Exp 1.1.1.1 rel-0_21 index.html Exp 1.1.1.1 rel-0_21 examples/91-22-5.cml Exp 1.1.1.1 rel-0_21 examples/91-22-5.xml Exp 1.1.1.1 rel-0_21 examples/95-48-7.xml Exp 1.1.1.1 rel-0_21 examples/95-48-7.xsl Exp 1.1.1.1 rel-0_21 examples/REC-xml-19980210.xml Exp 1.1.1.1 rel-0_21 examples/REC-xslt-19991116.xml Exp 1.1.1.1 rel-0_21 examples/agenda.dtd Exp 1.1.1.1 rel-0_21 examples/agenda.html Exp 1.1.1.1 rel-0_21 examples/agenda.xml Exp 1.1.1.1 rel-0_21 examples/agenda.xsl Exp 1.1.1.1 rel-0_21 examples/bernhard.xml Exp 1.1.1.1 rel-0_21 examples/bernhard.xsl Exp 1.1.1.1 rel-0_21 examples/cml.xsl Exp 1.1.1.1 rel-0_21 examples/cml2cml.xsl Exp 1.1.1.1 rel-0_21 examples/grammar.xsl Exp 1.1.1.1 rel-0_21 examples/grammar2.xsl Exp 1.1.1.1 rel-0_21 examples/identity.xsl Exp 1.1.1.1 rel-0_21 examples/identity.xsl_org Exp 1.1.1.1 rel-0_21 examples/test.dtd Exp 1.1.1.1 rel-0_21 examples/test.xml Exp 1.1.1.1 rel-0_21 examples/test.xsl Exp 1.1.1.1 rel-0_21 examples/test2.xsl Exp 1.1.1.1 rel-0_21 examples/xmlspec.xsl Exp 1.1.1.1 rel-0_21 examples/xpath.xsl Exp 1.1.1.1 rel-0_21 examples/xslt.xsl Exp 1.1.1.1 rel-0_21 From CPAN v 0.21 2000-06-01 Thursday 2000-06-01T02:22:03Z hexmode xslt-parser Exp 1.1 Makefile.PL Exp 1.1 README Exp 1.1 XSLT.pm Exp 1.1 index.txt Exp 1.1 changelog.html Exp 1.1 index.html Exp 1.1 examples/91-22-5.cml Exp 1.1 VERSION-040 examples/91-22-5.xml Exp 1.1 VERSION-040 examples/95-48-7.xml Exp 1.1 VERSION-040 examples/95-48-7.xsl Exp 1.1 VERSION-040 examples/REC-xml-19980210.xml Exp 1.1 VERSION-040 examples/REC-xslt-19991116.xml Exp 1.1 examples/agenda.dtd Exp 1.1 VERSION-040 examples/agenda.html Exp 1.1 VERSION-040 examples/agenda.xml Exp 1.1 VERSION-040 examples/agenda.xsl Exp 1.1 VERSION-040 examples/bernhard.xml Exp 1.1 VERSION-040 examples/bernhard.xsl Exp 1.1 VERSION-040 examples/cml.xsl Exp 1.1 VERSION-040 examples/cml2cml.xsl Exp 1.1 VERSION-040 examples/grammar.xsl Exp 1.1 VERSION-040 examples/grammar2.xsl Exp 1.1 VERSION-040 examples/identity.xsl Exp 1.1 VERSION-040 examples/identity.xsl_org Exp 1.1 VERSION-040 examples/test.dtd Exp 1.1 VERSION-040 examples/test.xml Exp 1.1 VERSION-040 examples/test.xsl Exp 1.1 VERSION-040 examples/test2.xsl Exp 1.1 VERSION-040 examples/xmlspec.xsl Exp 1.1 examples/xpath.xsl Exp 1.1 VERSION-040 examples/xslt.xsl Exp 1.1 VERSION-040 Initial revision XML-XSLT-0.48/README0100755000076500007650000000436207411650423013770 0ustar jonathanjonathan Perl module: XML::XSLT Copyright (c) 1999 Geert Josten & Egon Willighagen. Copyright (c) 2001 Mark A. Hershberger All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The Perl licence can be found in the README file in the Perl source distribution. Information can be found at: http://xmlxslt.sourceforge.net/ The module can be found at: http://www.cpan.org/modules/by-module/XML *** THIS IS ALPHA SOFTWARE *** Expect the interface to change between versions. This is a Perl module to parse XSL Transformational sheets. For a description of the XSLT, see http://www.w3.org/TR/xslt. Currently, it uses XML::Parser and XML::DOM, but an effort is being made to use XML::XPath. XML::Parser is a Perl extension interface to James Clark's XML parser, expat. It requires at least version 5.004 of perl and can be found on CPAN. XML::DOM is a Perl module that allows XML::Parser to build an Object Oriented data structure with a DOM Level 1 compliant interface. While we are working towards conformance with the W3 spec, this is an alpha version and the module does not conform to the XSLT working draft at this moment. New releases will be announced on the perl-xml mailing list [You can subscribe to this list by sending a message to subscribe-perl-xml@lyris.activestate.com] and to the XML::XSLT mailing list [subscribe by sending a message to xmlxslt-subscribe@listbot.com]. Please post bug reports to To configure this module, cd to the directory that contains this README file and type the following. perl Makefile.PL Then to build you run make. make Run the tests: make test If you have write access to the perl library directories, you may then install by typing (as superuser): make install If you want to install this module somewhere other than the standard location then you should use: perl Makefile.PL PREFIX=/location/of/libs at the first stage above and then in a program that wants to use the module you should put: use lib qw(/locations/of/libs/lib/perl5); early on in the program. DEPENDENCIES This module requires XML::DOM (and its dependency XML::Parser) and also requires Test::More to perform its tests. XML-XSLT-0.48/xslt-parser0100755000076500007650000000423507247356245015332 0ustar jonathanjonathan#!/usr/local/bin/perl -w # $Id: xslt-parser,v 1.9 2001/03/01 05:22:45 hexmode Exp $ use strict; use XML::XSLT; use Getopt::Std; =head1 NAME xslt-parser - XSLT transformations =head1 SYNOPSIS xslt-parser [options] =head1 DESCRIPTION xslt-parser performs stylesheet transformations. When given a project name, it appends `.xsl' for the XSLT stylesheet and `.xml' for the XML file to apply the stylesheet to and performs the transformation using the XML::XSLT perl module. =head1 OPTIONS =over 4 =item -c Pass through HTML::Clean. You must have HTML::Clean installed. =item -d Turns debugging on. This can produce a lot of noise. =item -n NoWeb. You can use xslt-parser as a CGI script. With this option, it will not output the headers that are usually needed. =item -s Specify a seperate different stylesheet. Usually, xslt-parser will simply append `.xsl' to the project name to get the stylesheet. A different stylesheet can be specified using this option. =head1 AUTHORS Geert Josten , Mark A. Hershberger =head1 SEE ALSO L The w3.org XSLT recommendation at L =cut my %opt; my $usage = "Usage: $0 [options] -d turn debugging mode on -n don't print content-type -s use instead of .xsl as template -w turns warnings on\n"; getopts('cdnws:h', \%opt) || die $usage; die $usage if $opt{h}; ## global vars ## my $noweb = $opt{n} || 0; my $debug = $opt{d} || 0; my $warnings = $opt{w} || 0; my $project = shift || die $usage; my $xslfile = $opt{'s'} || "$project.xsl"; my $xmlfile = $project; my $xslt = XML::XSLT->new (Source => $xslfile, debug => $debug, warnings => $debug); if (! -f $xslfile) { die "$xslfile does not exist."; } if (! -f $xmlfile) { $xmlfile .= ".xml"; if (! -f $xmlfile) { die "$xmlfile does not exist." } } print STDERR qq{Debug : "$debug" NoWeb : "$noweb" Warnings: "$warnings" Project : "$project" XML-file: "$xmlfile" XSL-file: "$xslfile" } if $debug; print $xslt->serve(Source => $xmlfile, clean => $opt{c}, http_headers => ! $noweb); XML-XSLT-0.48/META.yml0100644000076500007650000000055710015344302014346 0ustar jonathanjonathan#XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX# name: XML-XSLT version: 0.48 version_from: lib/XML/XSLT.pm installdirs: site requires: Test::More: 0.33 XML::DOM: 1.25 XML::Parser: 2.23 distribution_type: module generated_by: ExtUtils::MakeMaker version 6.13 XML-XSLT-0.48/Makefile.PL0100755000076500007650000000123010015233226015041 0ustar jonathanjonathan#!/usr/local/bin/perl use ExtUtils::MakeMaker; WriteMakefile( ABSTRACT_FROM => 'lib/XML/XSLT.pm', ABSTRACT => 'Conversion of XML files with XSLT.', AUTHOR => 'Geert Josten (gjosten@sci.kun.nl) and Egon Willighagen (egonw@sci.kun.nl)', NAME => 'XML::XSLT', dist => { COMPRESS => 'gzip', SUFFIX => '.gz'}, VERSION_FROM => 'lib/XML/XSLT.pm', PREREQ_PM => { XML::Parser => '2.23', XML::DOM => '1.25', Test::More => '0.33' }, EXE_FILES => ['xslt-parser'], ); XML-XSLT-0.48/lib/0040700000076500007650000000000010015344303013626 5ustar jonathanjonathanXML-XSLT-0.48/lib/XML/0040700000076500007650000000000010015344303014266 5ustar jonathanjonathanXML-XSLT-0.48/lib/XML/XSLT.pm0100644000076500007650000031247410015250661015443 0ustar jonathanjonathan############################################################################## # # Perl module: XML::XSLT # # By Geert Josten, gjosten@sci.kun.nl # and Egon Willighagen, egonw@sci.kun.nl # # $Log: XSLT.pm,v $ # Revision 1.25 2004/02/19 08:38:40 gellyfish # * Fixed overlapping attribute-sets # * Allow multiple nodes for processing-instruction() etc # * Added test for for-each # # Revision 1.24 2004/02/18 08:34:38 gellyfish # * Fixed select on "comment()" "processing-instruction()" etc # * Added test for select # # Revision 1.23 2004/02/17 10:06:12 gellyfish # * Added test for xsl:copy # # Revision 1.22 2004/02/17 08:52:29 gellyfish # * 'use-attribute-sets' works in xsl:copy and recursively # # Revision 1.21 2004/02/16 10:29:20 gellyfish # * Fixed variable implementation to handle non literals # * refactored test implementation # * added tests # # Revision 1.20 2003/06/24 16:34:51 gellyfish # * Allowed both name and match attributes in templates # * Lost redefinition warning with perl 5.8 # # Revision 1.19 2002/02/18 09:05:14 gellyfish # Refactoring # # Revision 1.18 2002/01/16 21:05:27 gellyfish # * Added the manpage as an example # * Started to properly implement omit-xml-declaration # # Revision 1.17 2002/01/13 10:35:00 gellyfish # Updated pod # # Revision 1.16 2002/01/09 09:17:40 gellyfish # * added test for # * Stylesheet whitespace stripping as per spec and altered tests ... # # Revision 1.15 2002/01/08 10:11:47 gellyfish # * First cut at cdata-section-element # * test for above # # Revision 1.14 2001/12/24 16:00:19 gellyfish # * Version released to CPAN # # Revision 1.13 2001/12/20 09:21:42 gellyfish # More refactoring # # Revision 1.12 2001/12/19 21:06:31 gellyfish # * Some refactoring and style changes # # Revision 1.11 2001/12/19 09:11:14 gellyfish # * Added more accessors for object attributes # * Fixed potentially broken usage of $variables in _evaluate_template # # Revision 1.10 2001/12/18 09:10:10 gellyfish # Implemented attribute-sets # # Revision 1.9 2001/12/17 22:32:12 gellyfish # * Added Test::More to Makefile.PL # * Added _indent and _outdent methods # * Placed __get_attribute_sets in transform() # # Revision 1.8 2001/12/17 11:32:08 gellyfish # * Rolled in various patches # * Added new tests # # ############################################################################### =head1 NAME XML::XSLT - A perl module for processing XSLT =cut ###################################################################### package XML::XSLT; ###################################################################### use strict; use XML::DOM 1.25; use LWP::Simple qw(get); use URI; use Cwd; use File::Basename qw(dirname); use Carp; # Namespace constants use constant NS_XSLT => 'http://www.w3.org/1999/XSL/Transform'; use constant NS_XHTML => 'http://www.w3.org/TR/xhtml1/strict'; use vars qw ( $VERSION @ISA @EXPORT_OK $AUTOLOAD ); $VERSION = '0.48'; @ISA = qw( Exporter ); @EXPORT_OK = qw( &transform &serve ); my %deprecation_used; ###################################################################### # PUBLIC DEFINITIONS sub new { my $class = shift; my $self = bless {}, $class; my %args = $self->__parse_args(@_); $self->{DEBUG} = defined $args{debug} ? $args{debug} : ""; no strict 'subs'; if ( $self->{DEBUG} ) { *__PACKAGE__::debug = \&debug; } else { *__PACKAGE__::debug = sub {}; } use strict 'subs'; $self->{INDENT} = defined $args{indent} ? $args{indent} : 0; $self->{PARSER} = XML::DOM::Parser->new(); $self->{PARSER_ARGS} = defined $args{DOMparser_args} ? $args{DOMparser_args} : {}; $self->{VARIABLES} = defined $args{variables} ? $args{variables} : {}; $self->debug(join ' ', keys %{$self->{VARIABLES}}); $self->{WARNINGS} = defined $args{warnings} ? $args{warnings} : 0; $self->{INDENT_INCR} = defined $args{indent_incr} ? $args{indent_incr} : 1; $self->{XSL_BASE} = defined $args{base} ? $args{base} : 'file://' . cwd . '/'; $self->{XML_BASE} = defined $args{base} ? $args{base} : 'file://' . cwd . '/'; $self->use_deprecated( $args{use_deprecated} ) if exists $args{use_deprecated}; $self->debug("creating parser object:"); $self->_indent(); $self->open_xsl(%args); $self->_outdent(); return $self; } sub use_deprecated { my ( $self, $use_deprecated ) = @_; if ( defined $use_deprecated ) { $self->{USE_DEPRECATED} = $use_deprecated; } return $self->{USE_DEPRECATED} || 0; } sub DESTROY { } # Cuts out random dies on includes sub default_xml_version { my ( $self, $xml_version ) = @_; if ( defined $xml_version ) { $self->{DEFAULT_XML_VERSION} = $xml_version; } return $self->{DEFAULT_XML_VERSION} ||= '1.0'; } sub serve { my $self = shift; my $class = ref $self || croak "Not a method call"; my %args = $self->__parse_args(@_); my $ret; $args{http_headers} = 1 unless defined $args{http_headers}; $args{xml_declaration} = 1 unless defined $args{xml_declaration}; $args{xml_version} = $self->default_xml_version() unless defined $args{xml_version}; $args{doctype} = 'SYSTEM' unless defined $args{doctype}; $args{clean} = 0 unless defined $args{clean}; $ret = $self->transform( $args{Source} )->toString; if ( $args{clean} ) { eval { require HTML::Clean }; if ($@) { CORE::warn("Not passing through HTML::Clean -- install the module"); } else { my $hold = HTML::Clean->new( \$ret ); $hold->strip; $ret = ${ $hold->data }; } } if ( my $doctype = $self->doctype() ) { $ret = $doctype . "\n" . $ret; } if ( $args{xml_declaration} ) { $ret = $self->xml_declaration() . "\n" . $ret; } if ( $args{http_headers} ) { $ret = "Content-Type: " . $self->media_type() . "\n" . "Content-Length: " . length($ret) . "\n\n" . $ret; } return $ret; } sub xml_declaration { my ( $self, $xml_version, $output_encoding ) = @_; $xml_version ||= $self->default_xml_version(); $output_encoding ||= $self->output_encoding(); return qq{}; } sub output_encoding { my ( $self, $encoding ) = @_; if ( defined $encoding ) { $self->{OUTPUT_ENCODING} = $encoding; } return exists $self->{OUTPUT_ENCODING} ? $self->{OUTPUT_ENCODING} : 'UTF-8'; } sub doctype_system { my ( $self, $doctype ) = @_; if ( defined $doctype ) { $self->{DOCTYPE_SYSTEM} = $doctype; } return $self->{DOCTYPE_SYSTEM}; } sub doctype_public { my ( $self, $doctype ) = @_; if ( defined $doctype ) { $self->{DOCTYPE_PUBLIC} = $doctype; } return $self->{DOCTYPE_PUBLIC}; } sub result_document() { my ( $self, $document ) = @_; if ( defined $document ) { $self->{RESULT_DOCUMENT} = $document; } return $self->{RESULT_DOCUMENT}; } sub debug { my $self = shift; my $arg = shift || ""; if ($self->{DEBUG} and $self->{DEBUG} > 1 ) { $arg = (caller(1))[3] . ": $arg"; } print STDERR " " x $self->{INDENT}, "$arg\n" if $self->{DEBUG}; } sub warn { my $self = shift; my $arg = shift || ""; print STDERR " " x $self->{INDENT}, "$arg\n" if $self->{DEBUG}; print STDERR "$arg\n" if $self->{WARNINGS} && !$self->{DEBUG}; } sub open_xml { my $self = shift; my $class = ref $self || croak "Not a method call"; my %args = $self->__parse_args(@_); if ( defined $self->xml_document() && not $self->{XML_PASSED_AS_DOM} ) { $self->debug("flushing old XML::DOM::Document object..."); $self->xml_document()->dispose; } $self->{XML_PASSED_AS_DOM} = 1 if ref $args{Source} eq 'XML::DOM::Document'; if ( defined $self->result_document() ) { $self->debug("flushing result..."); $self->result_document()->dispose(); } $self->debug("opening xml..."); $args{parser_args} ||= {}; my $xml_document = $self->__open_document( Source => $args{Source}, base => $self->{XML_BASE}, parser_args => { %{ $self->{PARSER_ARGS} }, %{ $args{parser_args} } }, ); $self->xml_document($xml_document); $self->{XML_BASE} = dirname( URI->new_abs( $args{Source}, $self->{XML_BASE} )->as_string ) . '/'; $self->result_document( $self->xml_document()->createDocumentFragment ); } sub xml_document { my ( $self, $xml_document ) = @_; if ( defined $xml_document ) { $self->{XML_DOCUMENT} = $xml_document; } return $self->{XML_DOCUMENT}; } sub open_xsl { my $self = shift; my $class = ref $self || croak "Not a method call"; my %args = $self->__parse_args(@_); $self->xsl_document()->dispose if not $self->{XSL_PASSED_AS_DOM} and defined $self->xsl_document(); $self->{XSL_PASSED_AS_DOM} = 1 if ref $args{Source} eq 'XML::DOM::Document'; # open new document # open new document $self->debug("opening xsl..."); $args{parser_args} ||= {}; my $xsl_document = $self->__open_document( Source => $args{Source}, base => $self->{XSL_BASE}, parser_args => { %{ $self->{PARSER_ARGS} }, %{ $args{parser_args} } }, ); $self->xsl_document($xsl_document); $self->{XSL_BASE} = dirname( URI->new_abs( $args{Source}, $self->{XSL_BASE} )->as_string ) . '/'; $self->__preprocess_stylesheet; } sub xsl_document { my ( $self, $xsl_document ) = @_; if ( defined $xsl_document ) { $self->{XSL_DOCUMENT} = $xsl_document; } return $self->{XSL_DOCUMENT}; } # Argument parsing with backwards compatibility. sub __parse_args { my $self = shift; my %args; if ( @_ % 2 ) { $args{Source} = shift; %args = ( %args, @_ ); } else { %args = @_; if ( not exists $args{Source} ) { my $name = [ caller(1) ]->[3]; carp "Argument syntax of call to $name deprecated. See the documentation for $name" unless $self->use_deprecated($args{use_deprecated}) or exists $deprecation_used{$name}; $deprecation_used{$name} = 1; %args = (); $args{Source} = shift; shift; %args = ( %args, @_ ); } } return %args; } # private auxiliary function # sub __my_tag_compression { my ( $tag, $elem ) = @_; =begin internal_docs __my_tag_compression__( $tag, $elem ) A function for DOM::XML::setTagCompression to determine the style for printing of empty tags and empty container tags. XML::XSLT implements an XHTML-friendly style. Allow tag to be preceded by a namespace: ([\w\.]+\:){0,1}
    ->
    or -> Empty tag list obtained from: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd According to "Appendix C. HTML Compatibility Guidelines", C.3 Element Minimization and Empty Element Content Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) do not use the minimized form (e.g. use

    and not

    ). However, the

    tag is processed like an empty tag here! Tags allowed: base meta link hr br param img area input col Special Case: p (even though it violates C.3) The tags are matched in order of expected common occurence. =end internal_docs =cut $tag = [ split ':', $tag ]->[1] if index( $tag, ':' ) >= 0; return 2 if $tag =~ m/^(p|br|img|hr|input|meta|base|link|param|area|col)$/i; # Print other empty tags like this: return 1; } # private auxiliary function # sub __preprocess_stylesheet { my $self = $_[0]; $self->debug("preprocessing stylesheet..."); $self->__get_first_element; $self->__extract_namespaces; $self->__get_stylesheet; # Why is this here when __get_first_element does, apparently, the same thing? # Because, in __get_stylesheet we warp the document. $self->top_xsl_node( $self->xsl_document()->getFirstChild ); $self->__expand_xsl_includes; $self->__extract_top_level_variables; $self->__add_default_templates; $self->__cache_templates; # speed optim $self->__set_xsl_output; } sub top_xsl_node { my ( $self, $top_xsl_node ) = @_; if ( defined $top_xsl_node ) { $self->{TOP_XSL_NODE} = $top_xsl_node; } return $self->{TOP_XSL_NODE}; } # private auxiliary function # sub __get_stylesheet { my $self = shift; my $stylesheet; my $xsl_ns = $self->xsl_ns(); my $xsl = $self->xsl_document(); foreach my $child ( $xsl->getElementsByTagName( '*', 0 ) ) { my ( $ns, $tag ) = split( ':', $child->getTagName() ); if ( not defined $tag ) { $tag = $ns; $ns = $self->default_ns(); } if ( $tag eq 'stylesheet' || $tag eq 'transform' ) { if ( my $attributes = $child->getAttributes() ) { my $version = $attributes->getNamedItem('version'); $self->xslt_version( $version->getNodeValue() ) if $version; } $stylesheet = $child; last; } } if ( !$stylesheet ) { # stylesheet is actually one complete template! # put it in a template-element $stylesheet = $xsl->createElement("${xsl_ns}stylesheet"); my $template = $xsl->createElement("${xsl_ns}template"); $template->setAttribute( 'match', "/" ); my $template_content = $xsl->getElementsByTagName( '*', 0 )->item(0); $xsl->replaceChild( $stylesheet, $template_content ); $stylesheet->appendChild($template); $template->appendChild($template_content); } $self->xsl_document($stylesheet); } sub xslt_version { my ( $self, $xslt_version ) = @_; if ( defined $xslt_version ) { $self->{XSLT_VERSION} = $xslt_version; } return $self->{XSLT_VERSION} ||= '1.0'; } # private auxiliary function # sub __get_first_element { my ($self) = @_; my $node = $self->xsl_document()->getFirstChild(); $node = $node->getNextSibling until ref $node eq 'XML::DOM::Element'; $self->top_xsl_node($node); } # private auxiliary function # sub __extract_namespaces { my ($self) = @_; my $attr = $self->top_xsl_node()->getAttributes; if ( defined $attr ) { foreach my $attribute ( $self->top_xsl_node()->getAttributes->getValues ) { my ( $pre, $post ) = split( ":", $attribute->getName, 2 ); my $value = $attribute->getValue; # Take care of namespaces if ( $pre eq 'xmlns' and not defined $post ) { $self->default_ns(''); $self->{NAMESPACE}->{ $self->default_ns() }->{namespace} = $value; $self->xsl_ns('') if $value eq NS_XSLT; $self->debug( "Namespace `" . $self->default_ns() . "' = `$value'" ); } elsif ( $pre eq 'xmlns' ) { $self->{NAMESPACE}->{$post}->{namespace} = $value; $self->xsl_ns("$post:") if $value eq NS_XSLT; $self->debug("Namespace `$post:' = `$value'"); } else { $self->default_ns(''); } # Take care of versions if ( $pre eq "version" and not defined $post ) { $self->{NAMESPACE}->{ $self->default_ns() }->{version} = $value; $self->debug( "Version for namespace `" . $self->default_ns() . "' = `$value'" ); } elsif ( $pre eq "version" ) { $self->{NAMESPACE}->{$post}->{version} = $value; $self->debug("Version for namespace `$post:' = `$value'"); } } } if ( not defined $self->default_ns() ) { my ($dns) = split( ':', $self->top_xsl_node()->getTagName ); $self->default_ns($dns); } $self->debug( "Default Namespace: `" . $self->default_ns() . "'" ); $self->xsl_ns( $self->default_ns() ) unless $self->xsl_ns(); $self->debug( "XSL Namespace: `" . $self->xsl_ns() . "'" ); # ** FIXME: is this right? $self->{NAMESPACE}->{ $self->default_ns() }->{namespace} ||= NS_XHTML; } sub default_ns { my ( $self, $default_ns ) = @_; if ( defined $default_ns ) { $self->{DEFAULT_NS} = $default_ns; } return exists $self->{DEFAULT_NS} ? $self->{DEFAULT_NS} : undef; } sub xsl_ns { my ( $self, $prefix ) = @_; if ( defined $prefix ) { $prefix .= ':' unless $prefix =~ /:$/; $self->{XSL_NS} = $prefix; } return $self->{XSL_NS}; } # private auxiliary function # sub __expand_xsl_includes { my $self = shift; foreach my $include_node ( $self->top_xsl_node() ->getElementsByTagName( $self->xsl_ns() . "include" ) ) { my $include_file = $include_node->getAttribute('href'); die "include tag carries no selection!" unless defined $include_file; my $include_doc; eval { my $tmp_doc = $self->__open_by_filename( $include_file, $self->{XSL_BASE} ); $include_doc = $tmp_doc->getFirstChild->cloneNode(1); $tmp_doc->dispose; }; die "parsing of $include_file failed: $@" if $@; $self->debug("inserting `$include_file'"); $include_doc->setOwnerDocument( $self->xsl_document() ); $self->top_xsl_node()->replaceChild( $include_doc, $include_node ); $include_doc->dispose; } } # private auxiliary function # sub __extract_top_level_variables { my $self = $_[0]; $self->debug("Extracting variables"); foreach my $child ( $self->xsl_document()->getChildNodes() ) { next unless $child->getNodeType() == ELEMENT_NODE; my $name = $child->getNodeName(); my ( $ns, $tag ) = split( ':', $name ); $self->debug("$ns $tag"); if ( 1 ) # ( $tag eq '' && $self->xsl_ns() eq '' ) # || $self->xsl_ns() eq $ns ) { $tag = $ns if $tag eq ''; $self->debug($tag); if ( $tag eq 'variable' || $tag eq 'param' ) { my $name = $child->getAttribute("name"); if ($name) { $self->debug("got $tag called $name"); my $value = $child->getAttributeNode("select"); if ( !defined $value ) { if ( $child->getChildNodes()->getLength() ) { my $result = $self->xml_document()->createDocumentFragment; $self->_evaluate_template( $child, $self->xml_document(), '', $result ); $value = $self->_string($result); $result->dispose(); } } else { $value = $value->getValue(); if ( $value =~ /'(.*)'/ ) { $value = $1; } } unless ( !defined $value ) { $self->debug("Setting $tag `$name' = `$value'"); $self->{VARIABLES}->{$name} = $value; } } else { # Required, so we die (http://www.w3.org/TR/xslt#variables) die "$tag tag carries no name!"; } } } } } # private auxiliary function # sub __add_default_templates { my $self = $_[0]; my $doc = $self->top_xsl_node()->getOwnerDocument; # create template for '*' and '/' my $elem_template = $doc->createElement( $self->xsl_ns() . "template" ); $elem_template->setAttribute( 'match', '*|/' ); # $elem_template->appendChild( $doc->createElement( $self->xsl_ns() . "apply-templates" ) ); # create template for 'text()' and '@*' my $attr_template = $doc->createElement( $self->xsl_ns() . "template" ); $attr_template->setAttribute( 'match', 'text()|@*' ); # $attr_template->appendChild( $doc->createElement( $self->xsl_ns() . "value-of" ) ); $attr_template->getFirstChild->setAttribute( 'select', '.' ); # create template for 'processing-instruction()' and 'comment()' my $pi_template = $doc->createElement( $self->xsl_ns() . "template" ); $pi_template->setAttribute( 'match', 'processing-instruction()|comment()' ); $self->debug("adding default templates to stylesheet"); # add them to the stylesheet $self->xsl_document()->insertBefore( $pi_template, $self->top_xsl_node ); $self->xsl_document() ->insertBefore( $attr_template, $self->top_xsl_node() ); $self->xsl_document() ->insertBefore( $elem_template, $self->top_xsl_node() ); } sub templates { my ( $self, $templates ) = @_; if ( defined $templates ) { $self->{TEMPLATE} = $templates; } unless ( exists $self->{TEMPLATE} ) { $self->{TEMPLATE} = []; my $xsld = $self->xsl_document(); my $tag = $self->xsl_ns() . 'template'; @{ $self->{TEMPLATE} } = $xsld->getElementsByTagName($tag); } return wantarray ? @{ $self->{TEMPLATE} } : $self->{TEMPLATE}; } # private auxiliary function # sub __cache_templates { my $self = $_[0]; # pre-cache template names and matches # # reversing the template order is much more efficient # foreach my $template ( reverse $self->templates() ) { if ( $template->getParentNode->getTagName =~ /^([\w\.\-]+\:){0,1}(stylesheet|transform|include)/ ) { my $match = $template->getAttribute('match') || ''; my $name = $template->getAttribute('name') || ''; push( @{ $self->{TEMPLATE_MATCH} }, $match ); push( @{ $self->{TEMPLATE_NAME} }, $name ); } } } =item xsl_output_method Get or set the {METHOD} = $method; } return exists $self->{METHOD} ? $self->{METHOD} : 'xml'; } # private auxiliary function # sub __set_xsl_output { my $self = $_[0]; # default settings $self->media_type('text/xml'); # extraction of top-level xsl:output tag my ($output) = $self->xsl_document() ->getElementsByTagName( $self->xsl_ns() . "output", 0 ); if ( defined $output ) { # extraction and processing of the attributes my $attribs = $output->getAttributes; my $media = $attribs->getNamedItem('media-type'); my $method = $attribs->getNamedItem('method'); $self->media_type( $media->getNodeValue ) if defined $media; $self->xsl_output_method($method->getNodeValue) if defined $method; if ( my $omit = $attribs->getNamedItem('omit-xml-declaration') ) { if ( $omit->getNodeValue() =~ /^(yes|no)$/ ) { $self->omit_xml_declaration($1); } else { # I would say that this should be fatal # Perhaps there should be a 'strict' option to the constructor my $m = qq{Wrong value for attribute "omit-xml-declaration" in\n\t} . $self->xsl_ns() . qq{output, should be "yes" or "no"}; $self->warn($m); } } unless ( $self->omit_xml_declaration() ) { my $output_ver = $attribs->getNamedItem('version'); my $output_enc = $attribs->getNamedItem('encoding'); $self->output_version( $output_ver->getNodeValue ) if defined $output_ver; $self->output_encoding( $output_enc->getNodeValue ) if defined $output_enc; if ( not $self->output_version() || not $self->output_encoding() ) { $self->warn( qq{Expected attributes "version" and "encoding" in\n\t} . $self->xsl_ns() . "output" ); } } my $doctype_public = $attribs->getNamedItem('doctype-public'); my $doctype_system = $attribs->getNamedItem('doctype-system'); my $dp = defined $doctype_public ? $doctype_public->getNodeValue : ''; $self->doctype_public($dp); my $ds = defined $doctype_system ? $doctype_system->getNodeValue : ''; $self->doctype_system($ds); # cdata-section-elements should only be used if the output type # is XML but as we are not checking that right now ... my $cdata_section = $attribs->getNamedItem('cdata-section-elements'); if ( defined $cdata_section ) { my $cdata_sections = []; @{$cdata_sections} = split /\s+/, $cdata_section->getNodeValue(); $self->cdata_sections($cdata_sections); } } else { $self->debug("Default Output options being used"); } } sub omit_xml_declaration { my ( $self, $omit_xml_declaration ) = @_; if ( defined $omit_xml_declaration ) { if ( $omit_xml_declaration =~ /^(yes|no)$/ ) { $self->{OMIT_XML_DECL} = ( $1 eq 'yes' ); } else { $self->{OMIT_XML_DECL} = $omit_xml_declaration ? 1 : 0; } } return exists $self->{OMIT_XML_DECL} ? $self->{OMIT_XML_DECL} : 0; } sub cdata_sections { my ( $self, $cdata_sections ) = @_; if ( defined $cdata_sections ) { $self->{CDATA_SECTIONS} = $cdata_sections; } $self->{CDATA_SECTIONS} = [] unless exists $self->{CDATA_SECTIONS}; return wantarray() ? @{ $self->{CDATA_SECTIONS} } : $self->{CDATA_SECTIONS}; } sub is_cdata_section { my ( $self, $element ) = @_; my %cdata_sections; my @cdata_temp = $self->cdata_sections(); @cdata_sections{@cdata_temp} = (1) x @cdata_temp; my $tagname; if ( defined $element and ref($element) and ref($element) eq 'XML::DOM' ) { $tagname = $element->getTagName(); } else { $tagname = $element; } # Will need to do namespace checking on this really return exists $cdata_sections{$tagname} ? 1 : 0; } sub output_version { my ( $self, $output_version ) = @_; if ( defined $output_version ) { $self->{OUTPUT_VERSION} = $output_version; } return exists $self->{OUTPUT_VERSION} ? $self->{OUTPUT_VERSION} : $self->default_xml_version(); } sub __get_attribute_sets { my ($self) = @_; my $doc = $self->xsl_document(); my $nsp = $self->xsl_ns(); my $tagname = $nsp . 'attribute-set'; my %inc; my @included; foreach my $attribute_set ( $doc->getElementsByTagName( $tagname, 0 ) ) { my $attribs = $attribute_set->getAttributes(); next unless defined $attribs; my $name_attr = $attribs->getNamedItem('name'); next unless defined $name_attr; my $name = $name_attr->getValue(); $self->debug("processing attribute-set $name"); if ( my $uas = $attribs->getNamedItem('use-attribute-sets') ) { $self->_indent(); $inc{$name} = $uas->getValue(); $self->debug("Attribute set $name includes $inc{$name}"); push @included, $name; $self->_outdent(); } my $attr_set = {}; my $tagname = $nsp . 'attribute'; foreach my $attribute ( $attribute_set->getElementsByTagName( $tagname, 0 ) ) { my $attribs = $attribute->getAttributes(); next unless defined $attribs; my $name_attr = $attribs->getNamedItem('name'); next unless defined $name_attr; my $attr_name = $name_attr->getValue(); $self->debug("Processing attribute $attr_name"); if ($attr_name) { my $result = $self->xml_document()->createDocumentFragment(); $self->_evaluate_template( $attribute, $self->xml_document(), '/', $result ); # might need variables my $value = $self->fix_attribute_value( $self->__string__($result) ); $attr_set->{$attr_name} = $value; $result->dispose(); $self->debug("Adding attribute $attr_name with value $value"); } } $self->__attribute_set_( $name, $attr_set ); } foreach my $as (@included ) { $self->_indent(); $self->debug("adding attributes from $inc{$as} to $as"); my %fix = (%{$self->__attribute_set_($as)},%{$self->__attribute_set_($inc{$as})}); $self->__attribute_set_($as,\%fix); $self->_outdent(); } } # Accessor for attribute sets sub __attribute_set_ { my ( $self, $name, $attr_hash ) = @_; if ( defined $attr_hash && defined $name ) { if ( exists $self->{ATTRIBUTE_SETS}->{$name} ) { %{$self->{ATTRIBUTE_SETS}->{$name}} = ( %{$self->{ATTRIBUTE_SETS}->{$name}}, %{$attr_hash}); } else { $self->{ATTRIBUTE_SETS}->{$name} = $attr_hash; } } return defined $name && exists $self->{ATTRIBUTE_SETS}->{$name} ? $self->{ATTRIBUTE_SETS}->{$name} : undef; } sub open_project { my $self = shift; my $xml = shift; my $xsl = shift; my ( $xmlflag, $xslflag, %args ) = @_; carp "open_project is deprecated." unless $self->use_deprecated() or exists $deprecation_used{open_project}; $deprecation_used{open_project} = 1; $self->debug("opening project:"); $self->_indent(); $self->open_xml( $xml, %args ); $self->open_xsl( $xsl, %args ); $self->debug("done..."); $self->_outdent(); } sub transform { my $self = shift; if ( keys %{$self->{VARIABLES}} ) { $self->debug("Adding variables"); push @_,'variables', $self->{VARIABLES}; } my %topvariables = $self->__parse_args(@_); $self->debug("transforming document:"); $self->_indent(); $self->open_xml(%topvariables); $self->debug("done..."); $self->_outdent(); # The _get_attribute_set needs an open XML document $self->_indent(); $self->__get_attribute_sets(); $self->_outdent(); $self->debug("processing project:"); $self->_indent(); $self->process(%topvariables); $self->debug("done!"); $self->_outdent(); $self->result_document()->normalize(); return $self->result_document(); } sub process { my ( $self, %topvariables ) = @_; $self->debug("processing project:"); $self->_indent(); my $root_template = $self->_match_template( "match", '/', 1, '' ); $self->debug(join ' ', keys %topvariables); %topvariables = ( defined $topvariables{variables} ? %{$topvariables{variables}} : (), defined $self->{VARIABLES} && ref $self->{VARIABLES} && ref $self->{VARIABLES} eq 'ARRAY' ? @{ $self->{VARIABLES} } : () ); $self->debug(join ' ', keys %topvariables); $self->_evaluate_template( $root_template, # starting template: the root template $self->xml_document(), '', # current XML selection path: the root $self->result_document(), # current result tree node: the root { () }, # current known variables: none \%topvariables # previously known variables: top level variables ); $self->debug("done!"); $self->_outdent(); } # Handles deprecations. sub AUTOLOAD { my $self = shift; my $type = ref($self) || croak "Not a method call"; my $name = $AUTOLOAD; $name =~ s/.*://; my %deprecation = ( 'output_string' => 'toString', 'result_string' => 'toString', 'output' => 'toString', 'result' => 'toString', 'result_mime_type' => 'media_type', 'output_mime_type' => 'media_type', 'result_tree' => 'to_dom', 'output_tree' => 'to_dom', 'transform_document' => 'transform', 'process_project' => 'process' ); if ( exists $deprecation{$name} ) { carp "$name is deprecated. Use $deprecation{$name}" unless $self->use_deprecated() or exists $deprecation_used{$name}; $deprecation_used{$name} = 1; eval qq{return \$self->$deprecation{$name}(\@_)}; } else { croak "$name: No such method name"; } } sub _my_print_text { my ( $self, $FILE ) = @_; if ( UNIVERSAL::isa( $self, "XML::DOM::CDATASection" ) ) { $FILE->print( $self->getData() ); } else { $FILE->print( XML::DOM::encodeText( $self->getData(), "<&" ) ); } } sub toString { my $self = $_[0]; local $^W; local *XML::DOM::Text::print = \&_my_print_text; my $string = $self->result_document()->toString(); return $string; } sub to_dom { my ($self) = @_; return $self->result_document(); } sub media_type { my ( $self, $media_type ) = @_; if ( defined $media_type ) { $self->{MEDIA_TYPE} = $media_type; } return $self->{MEDIA_TYPE}; } sub print_output { my ( $self, $file, $mime ) = @_; $file ||= ''; # print to STDOUT by default $mime = 1 unless defined $mime; # print mime-type header etc by default # $self->{RESULT_DOCUMENT}->printToFileHandle (\*STDOUT); # or $self->{RESULT_DOCUMENT}->print (\*STDOUT); ??? # exit; carp "print_output is deprecated. Use serve." unless $self->use_deprecated() or exists $deprecation_used{print_output}; $deprecation_used{print_output} = 1; if ($mime) { print "Content-type: " . $self->media_type() . "\n\n"; if ( $self->xsl_output_method =~ /(?:xml|html)/ ) { unless ( $self->omit_xml_declaration() ) { print $self->xml_declaration(), "\n"; } } if ( my $doctype = $self->doctype() ) { print "$doctype\n"; } } if ($file) { if ( ref( \$file ) eq 'SCALAR' ) { print $file $self->output_string, "\n"; } else { if ( open( FILE, ">$file" ) ) { print FILE $self->output_string, "\n"; if ( !close(FILE) ) { die("Error writing $file: $!. Nothing written...\n"); } } else { die("Error opening $file: $!. Nothing done...\n"); } } } else { print $self->output_string, "\n"; } } *print_result = *print_output; sub doctype { my ($self) = @_; my $doctype = ""; if ( $self->doctype_public() || $self->doctype_system() ) { my $root_name = $self->result_document()->getElementsByTagName( '*', 0 )->item(0) ->getTagName; if ( $self->doctype_public() ) { $doctype = qq{doctype_public() . qq{" "} . $self->doctype_system() . qq{">}; } else { $doctype = qq{doctype_system() . qq{">}; } } $self->debug("returning doctype of $doctype"); return $doctype; } sub dispose { #my $self = $_[0]; #$_[0]->[PARSER] = undef if (defined $_[0]->[PARSER]); $_[0]->result_document()->dispose if ( defined $_[0]->result_document() ); # only dispose xml and xsl when they were not passed as DOM if ( not defined $_[0]->{XML_PASSED_AS_DOM} && defined $_ - [0]->xml_document() ) { $_[0]->xml_document()->dispose; } if ( not defined $_[0]->{XSL_PASSED_AS_DOM} && defined $_ - [0]->xsl_document() ) { $_[0]->xsl_document()->dispose; } $_[0] = undef; } ###################################################################### # PRIVATE DEFINITIONS sub __open_document { my $self = shift; my %args = @_; %args = ( %{ $self->{PARSER_ARGS} }, %args ); my $doc; $self->debug("opening document"); eval { my $ref = ref( $args{Source} ); if ( !$ref && length $args{Source} < 255 && ( -f $args{Source} || lc( substr( $args{Source}, 0, 5 ) ) eq 'http:' || lc( substr( $args{Source}, 0, 6 ) ) eq 'https:' || lc( substr( $args{Source}, 0, 4 ) ) eq 'ftp:' || lc( substr( $args{Source}, 0, 5 ) ) eq 'file:' ) ) { # Filename $self->debug("Opening URL"); $doc = $self->__open_by_filename( $args{Source}, $args{base} ); } elsif ( !$ref ) { # String $self->debug("Opening String"); $doc = $self->{PARSER}->parse( $args{Source} ); } elsif ( $ref eq "SCALAR" ) { # Stringref $self->debug("Opening Stringref"); $doc = $self->{PARSER}->parse( ${ $args{Source} } ); } elsif ( $ref eq "XML::DOM::Document" ) { # DOM object $self->debug("Opening XML::DOM"); $doc = $args{Source}; } elsif ( $ref eq "GLOB" ) { # This is a file glob $self->debug("Opening GLOB"); my $ioref = *{ $args{Source} }{IO}; $doc = $self->{PARSER}->parse($ioref); } elsif ( UNIVERSAL::isa( $args{Source}, 'IO::Handle' ) ) { # IO::Handle $self->debug("Opening IO::Handle"); $doc = $self->{PARSER}->parse( $args{Source} ); } else { $doc = undef; } }; die "Error while parsing: $@\n" . $args{Source} if $@; return $doc; } # private auxiliary function # sub __open_by_filename { my ( $self, $filename, $base ) = @_; my $doc; # ** FIXME: currently reads the whole document into memory # might not be avoidable # LWP should be able to deal with files as well as links $ENV{DOMAIN} ||= "example.com"; # hide complaints from Net::Domain my $file = get( URI->new_abs( $filename, $base ) ); return $self->{PARSER}->parse( $file, %{ $self->{PARSER_ARGS} } ); } sub _match_template { my ( $self, $attribute_name, $select_value, $xml_count, $xml_selection_path, $mode ) = @_; $mode ||= ""; my $template = ""; my @template_matches = (); $self->debug( qq{matching template for "$select_value" with count $xml_count\n\t} . qq{and path "$xml_selection_path":} ); if ( $attribute_name eq "match" && ref $self->{TEMPLATE_MATCH} ) { push @template_matches, @{ $self->{TEMPLATE_MATCH} }; } elsif ( $attribute_name eq "name" && ref $self->{TEMPLATE_NAME} ) { push @template_matches, @{ $self->{TEMPLATE_NAME} }; } # note that the order of @template_matches is the reverse of $self->{TEMPLATE} my $count = @template_matches; foreach my $original_match (@template_matches) { # templates with no match or name or with both simultaniuously # have no $template_match value if ($original_match) { my $full_match = $original_match; # multipe match? (for example: match="*|/") while ( $full_match =~ s/^(.+?)\|// ) { my $match = $1; if ( &__template_matches__( $match, $select_value, $xml_count, $xml_selection_path ) ) { $self->debug( qq{ found #$count with "$match" in "$original_match"}); $template = ( $self->templates() )[ $count - 1 ]; return $template; # last; } } # last match? if ( !$template ) { if ( &__template_matches__( $full_match, $select_value, $xml_count, $xml_selection_path ) ) { $self->debug( qq{ found #$count with "$full_match" in "$original_match"} ); $template = ( $self->templates() )[ $count - 1 ]; return $template; # last; } else { $self->debug(qq{ #$count "$original_match" did not match}); } } } $count--; } if ( !$template ) { $self->warn(qq{No template matching `$xml_selection_path' found !!}); } return $template; } # auxiliary function # sub __template_matches__ { my ( $template, $select, $count, $path ) = @_; my $nocount_path = $path; $nocount_path =~ s/\[.*?\]//g; if ( ( $template eq $select ) || ( $template eq $path ) || ( $template eq "$select\[$count\]" ) || ( $template eq "$path\[$count\]" ) ) { # perfect match or path ends with templates match #print "perfect match","\n"; return "True"; } elsif ( ( $template eq substr( $path, -length($template) ) ) || ( $template eq substr( $nocount_path, -length($template) ) ) || ( "$template\[$count\]" eq substr( $path, -length($template) ) ) || ( "$template\[$count\]" eq substr( $nocount_path, -length($template) ) ) ) { # template matches tail of path matches perfectly #print "perfect tail match","\n"; return "True"; } elsif ( $select =~ /\[\s*(\@.*?)\s*=\s*(.*?)\s*\]$/ ) { # match attribute test my $attribute = $1; my $value = $2; return ""; # False, no test evaluation yet # } elsif ( $select =~ /\[\s*(.*?)\s*=\s*(.*?)\s*\]$/ ) { # match test my $element = $1; my $value = $2; return ""; # False, no test evaluation yet # } elsif ( $select =~ /(\@\*|\@[\w\.\-\:]+)$/ ) { # match attribute my $attribute = $1; #print "attribute match?\n"; return ( ( $template eq '@*' ) || ( $template eq $attribute ) || ( $template eq "\@*\[$count\]" ) || ( $template eq "$attribute\[$count\]" ) ); } elsif ( $select =~ /(\*|[\w\.\-\:]+)$/ ) { # match element my $element = $1; #print "element match?\n"; return ( ( $template eq "*" ) || ( $template eq $element ) || ( $template eq "*\[$count\]" ) || ( $template eq "$element\[$count\]" ) ); } else { return ""; # False # } } sub _evaluate_test { my ( $self, $test, $current_xml_node, $current_xml_selection_path, $variables ) = @_; $self->debug("Doing test $test"); if ( $test =~ /^(.+)\/\[(.+)\]$/ ) { my $path = $1; $test = $2; $self->debug("evaluating test $test at path $path:"); $self->_indent(); my $node = $self->_get_node_set( $path, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); if (@$node) { $current_xml_node = $$node[0]; } else { return ""; } $self->_outdent(); } else { $self->debug("evaluating path or test $test:"); my $node = $self->_get_node_set( $test, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables, "silent" ); $self->_indent(); if (@$node) { $self->debug("path exists!"); return "true"; } else { $self->debug("not a valid path, evaluating as test"); } $self->_outdent(); } $self->_indent(); my $result = $self->__evaluate_test__( $test, $current_xml_selection_path, $current_xml_node, $variables ); $self->debug("test evaluates @{[ $result ? 'true': 'false']}"); $self->_outdent(); return $result; } sub _evaluate_template { my ( $self, $template, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->debug( qq{evaluating template content with current path } . qq{"$current_xml_selection_path": } ); $self->_indent(); die "No Template" unless defined $template && ref $template; $template->normalize; foreach my $child ( $template->getChildNodes ) { my $ref = ref $child; $self->debug("$ref"); $self->_indent(); my $node_type = $child->getNodeType; if ( $node_type == ELEMENT_NODE ) { $self->_evaluate_element( $child, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $node_type == TEXT_NODE ) { my $value = $child->getNodeValue; if ( length($value) and $value !~ /^[\x20\x09\x0D\x0A]+$/s ) { $self->_add_node( $child, $current_result_node ); } } elsif ( $node_type == CDATA_SECTION_NODE ) { my $text = $self->xml_document()->createTextNode( $child->getData ); $self->_add_node( $text, $current_result_node ); } elsif ( $node_type == ENTITY_REFERENCE_NODE ) { $self->_add_node( $child, $current_result_node ); } elsif ( $node_type == DOCUMENT_TYPE_NODE ) { # skip # $self->debug("Skipping Document Type node..."); } elsif ( $node_type == COMMENT_NODE ) { # skip # $self->debug("Skipping Comment node..."); } else { $self->warn( "evaluate-template: Dunno what to do with node of type $ref !!!\n\t" . "($current_xml_selection_path)" ); } $self->_outdent(); } $self->debug("done!"); $self->_outdent(); } sub _add_node { my ( $self, $node, $parent, $deep, $owner ) = @_; $owner ||= $self->xml_document(); my $what = defined $deep ? 'deep' : 'non-deep'; $self->debug("adding node ($what).."); $node = $node->cloneNode($deep); $node->setOwnerDocument($owner); if ( $node->getNodeType == ATTRIBUTE_NODE ) { $parent->setAttributeNode($node); } else { $parent->appendChild($node); } } sub _apply_templates { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $children; my $params = {}; my $newvariables = defined $variables ? {%$variables} : {}; my $select = $xsl_node->getAttribute('select'); if ( $select =~ /\$/ and defined $variables ) { # replacing occurences of variables: foreach my $varname ( keys(%$variables) ) { $self->debug("Applying variable $varname"); $select =~ s/[^\\]\$$varname/$$variables{$varname}/g; } } if ($select) { $self->debug( qq{applying templates on children select of "$current_xml_selection_path":} ); $children = $self->_get_node_set( $select, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); } else { $self->debug( qq{applying templates on all children of "$current_xml_selection_path":} ); $children = [ $current_xml_node->getChildNodes ]; } $self->_process_with_params( $xsl_node, $current_xml_node, $current_xml_selection_path, $variables, $params ); # process xsl:sort here $self->_indent(); my $count = 1; foreach my $child (@$children) { my $node_type = $child->getNodeType; if ( $node_type == DOCUMENT_TYPE_NODE ) { # skip # $self->debug("Skipping Document Type node..."); } elsif ( $node_type == DOCUMENT_FRAGMENT_NODE ) { # skip # $self->debug("Skipping Document Fragment node..."); } elsif ( $node_type == NOTATION_NODE ) { # skip # $self->debug("Skipping Notation node..."); } else { my $newselect = ""; my $newcount = $count; if ( !$select || ( $select eq '.' ) ) { if ( $node_type == ELEMENT_NODE ) { $newselect = $child->getTagName; } elsif ( $node_type == ATTRIBUTE_NODE ) { $newselect = "@$child->getName"; } elsif (( $node_type == TEXT_NODE ) || ( $node_type == ENTITY_REFERENCE_NODE ) ) { $newselect = "text()"; } elsif ( $node_type == PROCESSING_INSTRUCTION_NODE ) { $newselect = "processing-instruction()"; } elsif ( $node_type == COMMENT_NODE ) { $newselect = "comment()"; } else { my $ref = ref $child; $self->debug("Unknown node encountered: `$ref'"); } } else { $newselect = $select; if ( $newselect =~ s/\[(\d+)\]$// ) { $newcount = $1; } } $self->_select_template( $child, $newselect, $newcount, $current_xml_node, $current_xml_selection_path, $current_result_node, $newvariables, $params ); } $count++; } $self->_indent(); } sub _for_each { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $ns = $self->xsl_ns(); my $select = $xsl_node->getAttribute('select') || die "No `select' attribute in for-each element"; if ( $select =~ /\$/ ) { # replacing occurences of variables: foreach my $varname ( keys(%$variables) ) { $select =~ s/[^\\]\$$varname/$$variables{$varname}/g; } } if ( defined $select ) { $self->debug( qq{applying template for each child $select of "$current_xml_selection_path":} ); my $children = $self->_get_node_set( $select, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); my $sort = $xsl_node->getElementsByTagName("$ns:sort",0); if ( my $nokeys = $sort->getLength() ) { $self->debug("going to sort with $nokeys"); } $self->_indent(); my $count = 1; foreach my $child (@$children) { my $node_type = $child->getNodeType; if ( $node_type == DOCUMENT_TYPE_NODE ) { # skip # $self->debug("Skipping Document Type node..."); } elsif ( $node_type == DOCUMENT_FRAGMENT_NODE ) { # skip # $self->debug("Skipping Document Fragment node..."); } elsif ( $node_type == NOTATION_NODE ) { # skip # $self->debug("Skipping Notation node..."); } else { $self->_evaluate_template( $xsl_node, $child, "$current_xml_selection_path/$select\[$count\]", $current_result_node, $variables, $oldvariables ); } $count++; } $self->_outdent(); } else { $self->warn(qq%expected attribute "select" in <${ns}for-each>%); } } sub _select_template { my ( $self, $child, $select, $count, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $ref = ref $child; $self->debug( qq{selecting template $select for child type $ref of "$current_xml_selection_path":} ); $self->_indent(); my $child_xml_selection_path = "$current_xml_selection_path/$select"; my $template = $self->_match_template( "match", $select, $count, $child_xml_selection_path ); if ($template) { $self->_evaluate_template( $template, $child, "$child_xml_selection_path\[$count\]", $current_result_node, $variables, $oldvariables ); } else { $self->debug("skipping template selection..."); } $self->_outdent(); } sub _evaluate_element { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my ( $ns, $xsl_tag ) = split( ':', $xsl_node->getTagName ); if ( not defined $xsl_tag ) { $xsl_tag = $ns; $ns = $self->default_ns(); } else { $ns .= ':'; } $self->debug( qq{evaluating element `$xsl_tag' from `$current_xml_selection_path': }); $self->_indent(); if ( $ns eq $self->xsl_ns() ) { my @attributes = $xsl_node->getAttributes->getValues; $self->debug(qq{This is an xsl tag}); if ( $xsl_tag eq 'apply-templates' ) { $self->_apply_templates( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'attribute' ) { $self->_attribute( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'call-template' ) { $self->_call_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'choose' ) { $self->_choose( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'comment' ) { $self->_comment( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'copy' ) { $self->_copy( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'copy-of' ) { $self->_copy_of( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables ); } elsif ( $xsl_tag eq 'element' ) { $self->_element( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'for-each' ) { $self->_for_each( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } elsif ( $xsl_tag eq 'if' ) { $self->_if( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); # } elsif ($xsl_tag eq 'output') { } elsif ( $xsl_tag eq 'param' ) { $self->_variable( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables, 1 ); } elsif ( $xsl_tag eq 'processing-instruction' ) { $self->_processing_instruction( $xsl_node, $current_result_node ); } elsif ( $xsl_tag eq 'text' ) { $self->_text( $xsl_node, $current_result_node ); } elsif ( $xsl_tag eq 'value-of' ) { $self->_value_of( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables ); } elsif ( $xsl_tag eq 'variable' ) { $self->_variable( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables, 0 ); } elsif ( $xsl_tag eq 'sort' ) { $self->_sort( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables, 0 ); } elsif ( $xsl_tag eq 'fallback' ) { $self->_fallback( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables, 0 ); } elsif ( $xsl_tag eq 'attribute-set' ) { $self->_attribute_set( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables, 0 ); } else { $self->_add_and_recurse( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } } else { $self->debug( $ns . " does not match " . $self->xsl_ns() ); # not entirely sure if this right but the spec is a bit vague if ( $self->is_cdata_section($xsl_tag) ) { $self->debug("This is a CDATA section element"); $self->_add_cdata_section( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } else { $self->debug("This is a literal element"); $self->_check_attributes_and_recurse( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } } $self->_outdent(); } sub _add_cdata_section { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $node = $self->xml_document()->createElement( $xsl_node->getTagName ); my $cdata = ''; foreach my $child_node ( $xsl_node->getChildNodes() ) { if ( $child_node->can('asString') ) { $cdata .= $child_node->asString(); } else { $cdata .= $child_node->getNodeValue(); } } $node->addCDATA($cdata); $current_result_node->appendChild($node); } sub _add_and_recurse { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; # the addition is commented out to prevent unknown xsl: commands to be printed in the result $self->_add_node( $xsl_node, $current_result_node ); $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); #->getLastChild); } sub _check_attributes_and_recurse { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->_add_node( $xsl_node, $current_result_node ); $self->_attribute_value_of( $current_result_node->getLastChild, $current_xml_node, $current_xml_selection_path, $variables ); $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node->getLastChild, $variables, $oldvariables ); } sub _element { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $name = $xsl_node->getAttribute('name'); $self->debug(qq{inserting Element named "$name":}); $self->_indent(); if ( defined $name ) { my $result = $self->xml_document()->createElement($name); $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $result, $variables, $oldvariables ); $self->_apply_attribute_set($xsl_node,$result); $current_result_node->appendChild($result); } else { $self->warn( q{expected attribute "name" in <} . $self->xsl_ns() . q{element>} ); } $self->_outdent(); } sub _apply_attribute_set { my ( $self,$xsl_node, $output_node) = @_; my $attr_set = $xsl_node->getAttribute('use-attribute-sets'); if ($attr_set) { $self->_indent(); my $set_name = $attr_set; if ( my $set = $self->__attribute_set_($set_name) ) { $self->debug("Adding attribute-set '$set_name'"); foreach my $attr_name ( keys %{$set} ) { $self->debug( "Adding attribute $attr_name ->" . $set->{$attr_name} ); $output_node->setAttribute( $attr_name, $set->{$attr_name} ); } } $self->_outdent(); } } { ###################################################################### # Auxiliary package for disable-output-escaping ###################################################################### package XML::XSLT::DOM::TextDOE; use vars qw( @ISA ); @ISA = qw( XML::DOM::Text ); sub print { my ( $self, $FILE ) = @_; $FILE->print( $self->getData ); } } sub _value_of { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables ) = @_; my $select = $xsl_node->getAttribute('select'); # Need to determine here whether the value is an XPath expression # and act accordingly my $xml_node; if ( defined $select ) { $xml_node = $self->_get_node_set( $select, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); $self->debug("stripping node to text:"); $self->_indent(); my $text = ''; $text = $self->__string__( $xml_node->[0] ) if @{$xml_node}; $self->_outdent(); if ( $text ne '' ) { my $node = $self->xml_document()->createTextNode($text); if ( $xsl_node->getAttribute('disable-output-escaping') eq 'yes' ) { $self->debug("disabling output escaping"); bless $node, 'XML::XSLT::DOM::TextDOE'; } $self->_move_node( $node, $current_result_node ); } else { $self->debug("nothing left.."); } } else { $self->warn( qq{expected attribute "select" in <} . $self->xsl_ns() . q{value-of>} ); } } sub __strip_node_to_text__ { my ( $self, $node ) = @_; my $result = ""; my $node_type = $node->getNodeType; if ( $node_type == TEXT_NODE ) { $result = $node->getData; } elsif (( $node_type == ELEMENT_NODE ) || ( $node_type == DOCUMENT_FRAGMENT_NODE ) ) { $self->_indent(); foreach my $child ( $node->getChildNodes ) { $result .= &__strip_node_to_text__( $self, $child ); } $self->_outdent(); } return $result; } sub __string__ { my ( $self, $node, $depth ) = @_; my $result = ""; if ( defined $node ) { my $ref = ( ref($node) || "not a reference" ); $self->debug("stripping child nodes ($ref):"); $self->_indent(); if ( $ref eq "ARRAY" ) { return $self->__string__( $$node[0], $depth ); } else { my $node_type = $node->getNodeType; if ( ( $node_type == ELEMENT_NODE ) || ( $node_type == DOCUMENT_FRAGMENT_NODE ) || ( $node_type == DOCUMENT_NODE ) ) { foreach my $child ( $node->getChildNodes ) { $result .= &__string__( $self, $child, 1 ); } } elsif ( $node_type == ATTRIBUTE_NODE ) { $result .= $node->getValue; } elsif (( $node_type == TEXT_NODE ) || ( $node_type == CDATA_SECTION_NODE ) || ( $node_type == ENTITY_REFERENCE_NODE ) ) { $result .= $node->getData; } elsif ( !$depth && ( ( $node_type == PROCESSING_INSTRUCTION_NODE ) || ( $node_type == COMMENT_NODE ) ) ) { $result .= $node->getData; # COM,PI - only in 'top-level' call } else { # just to be consistent $self->warn("Can't get string-value for node of type $ref !"); } } $self->debug(qq{ "$result"}); $self->_outdent(); } else { $self->debug(" no result"); } return $result; } sub _move_node { my ( $self, $node, $parent ) = @_; $self->debug("moving node.."); $parent->appendChild($node); } sub _get_node_set { my ( $self, $path, $root_node, $current_path, $current_node, $variables, $silent ) = @_; $current_path ||= "/"; $current_node ||= $root_node; $silent ||= 0; %{$variables} = (%{$self->{VARIABLES}}, %{$variables}); $self->debug(qq{getting node-set "$path" from "$current_path"}); $self->_indent(); # expand abbriviated syntax $path =~ s/\@/attribute\:\:/g; $path =~ s/\.\./parent\:\:node\(\)/g; $path =~ s/\./self\:\:node\(\)/g; $path =~ s/\/\//\/descendant\-or\-self\:\:node\(\)\//g; #$path =~ s/\/[^\:\/]*?\//attribute::/g; if ( $path =~ /^\$([\w\.\-]+)$/ ) { my $varname = $1; $self->debug("looking for variable $varname"); $self->debug(join ' ', keys %{$variables}); my $var = $$variables{$varname}; if ( defined $var ) { if ( ref( $$variables{$varname} ) eq 'ARRAY' ) { # node-set array-ref return $$variables{$varname}; } elsif ( ref( $$variables{$varname} ) eq 'XML::DOM::NodeList' ) { # node-set nodelist return [ @{ $$variables{$varname} } ]; } elsif ( ref( $$variables{$varname} ) eq 'XML::DOM::DocumentFragment' ) { # node-set documentfragment return [ $$variables{$varname}->getChildNodes ]; } else { # string or number? return [ $self->xml_document() ->createTextNode( $$variables{$varname} ) ]; } } else { # var does not exist return []; } } elsif ( $path eq $current_path || $path eq 'self::node()' ) { $self->debug("direct hit!"); return [$current_node]; } else { # open external documents first # if ( $path =~ /^\s*document\s*\(["'](.*?)["']\s*(,\s*(.*)\s*){0,1}\)\s*(.*)$/ ) { my $filename = $1; my $sec_arg = $3; $path = ( $4 || "" ); $self->debug(qq{external selection ("$filename")!}); if ($sec_arg) { $self->warn("Ignoring second argument of $path"); } ($root_node) = $self->__open_by_filename( $filename, $self->{XSL_BASE} ); } if ( $path =~ /^\// ) { # start from the root # $current_node = $root_node; } elsif ( $path =~ /^self\:\:node\(\)\// ) { #'#"#'#" # remove preceding dot from './etc', which is expanded to 'self::node()' # at the top of this subroutine # $path =~ s/^self\:\:node\(\)//; } else { # to facilitate parsing, precede path with a '/' # $path = "/$path"; } $self->debug(qq{using "$path":}); if ( $path eq '/' ) { $current_node = [$current_node]; } else { $current_node = $self->__get_node_set__( $path, [$current_node], $silent ); } $self->_outdent(); return $current_node; } } # auxiliary function # sub __get_node_set__ { my ( $self, $path, $node, $silent ) = @_; # a Qname (?) should actually be: [a-Z_][\w\.\-]*\:[a-Z_][\w\.\-]* if ( $path eq "" ) { $self->debug("node found!"); return $node; } else { my $list = []; foreach my $item (@$node) { my $sublist = $self->__try_a_step__( $path, $item, $silent ); push( @$list, @$sublist ); } return $list; } } sub __try_a_step__ { my ( $self, $path, $node, $silent ) = @_; $self->_indent(); $self->debug("Trying $path >"); if ( $path =~ s/^\/parent\:\:node\(\)// ) { # /.. # $self->debug(qq{getting parent ("$path")}); return &__parent__( $self, $path, $node, $silent ); } elsif ( $path =~ s/^\/attribute\:\:(\*|[\w\.\:\-]+)// ) { # /@attr # $self->debug(qq{getting attribute `$1' ("$path")}); return &__attribute__( $self, $1, $path, $node, $silent ); } elsif ( $path =~ s/^\/descendant\-or\-self\:\:node\(\)\/(child\:\:|)(\*|[\w\.\:\-]+)\[(\S+?)\]// ) { # //elem[n] # $self->debug(qq{getting deep indexed element `$1' `$2' ("$path")}); return &__indexed_element__( $self, $1, $2, $path, $node, $silent, "deep" ); } elsif ( $path =~ s/^\/descendant\-or\-self\:\:node\(\)\/(\*|[\w\.\:\-]+)// ) { # //elem # $self->debug(qq{getting deep element `$1' ("$path")}); return &__element__( $self, $1, $path, $node, $silent, "deep" ); } elsif ( $path =~ s/^\/(child\:\:|)(\*|[\w\.\:\-]+)\[(\S+?)\]// ) { # /elem[n] # $self->debug(qq{getting indexed element `$2' `$3' ("$path")}); return &__indexed_element__( $self, $2, $3, $path, $node, $silent ); } elsif ( $path =~ s/^\/(child\:\:|)text\(\)// ) { # /text() # $self->debug(qq{getting text ("$path")}); return &__get_nodes__( $self, TEXT_NODE, $path, $node, $silent ); } elsif ( $path =~ s/^\/(child\:\:|)processing-instruction\(\)// ) { # /processing-instruction() # $self->debug(qq{getting processing instruction ("$path")}); return $self->__get_nodes__(PROCESSING_INSTRUCTION_NODE, $path, $node, $silent ); } elsif ( $path =~ s/^\/(child\:\:|)comment\(\)// ) { # /comment() # $self->debug(qq{getting comment ("$path")}); return &__get_nodes__( $self, COMMENT_NODE, $path, $node, $silent ); } elsif ( $path =~ s/^\/(child\:\:|)(\*|[\w\.\:\-]+)// ) { # /elem # $self->debug(qq{getting element `$2' ("$path")}); return &__element__( $self, $2, $path, $node, $silent ); } else { $self->warn( "get-node-from-path: Don't know what to do with path $path !!!"); return []; } } sub __parent__ { my ( $self, $path, $node, $silent ) = @_; $self->_indent(); if ( ( $node->getNodeType == DOCUMENT_NODE ) || ( $node->getNodeType == DOCUMENT_FRAGMENT_NODE ) ) { $self->debug("no parent!"); $node = []; } else { $node = $node->getParentNode; $node = &__get_node_set__( $self, $path, [$node], $silent ); } $self->_outdent(); return $node; } sub __indexed_element__ { my ( $self, $element, $index, $path, $node, $silent, $deep ) = @_; $index ||= 0; $deep ||= ""; # False # if ( $index =~ /^first\s*\(\)/ ) { $index = 0; } elsif ( $index =~ /^last\s*\(\)/ ) { $index = -1; } else { $index--; } my @list = $node->getElementsByTagName( $element, $deep ); if (@list) { $node = $list[$index]; } else { $node = ""; } $self->_indent(); if ($node) { $node = &__get_node_set__( $self, $path, [$node], $silent ); } else { $self->debug("failed!"); $node = []; } $self->_outdent(); return $node; } sub __element__ { my ( $self, $element, $path, $node, $silent, $deep ) = @_; $deep ||= ""; # False # $node = [ $node->getElementsByTagName( $element, $deep ) ]; $self->_indent(); if (@$node) { $node = &__get_node_set__( $self, $path, $node, $silent ); } else { $self->debug("failed!"); } $self->_outdent(); return $node; } sub __attribute__ { my ( $self, $attribute, $path, $node, $silent ) = @_; if ( $attribute eq '*' ) { $node = [ $node->getAttributes->getValues ]; $self->_indent(); if ($node) { $node = &__get_node_set__( $self, $path, $node, $silent ); } else { $self->debug("failed!"); } $self->_outdent(); } else { $node = $node->getAttributeNode($attribute); $self->_indent(); if ($node) { $node = &__get_node_set__( $self, $path, [$node], $silent ); } else { $self->debug("failed!"); $node = []; } $self->_outdent(); } return $node; } sub __get_nodes__ { my ( $self, $node_type, $path, $node, $silent ) = @_; my $result = []; $self->_indent(); foreach my $child ( $node->getChildNodes ) { if ( $child->getNodeType == $node_type ) { push @{$result}, @{$self->__get_node_set__($path, [$child], $silent )}; } } $self->_outdent(); if ( !@$result ) { $self->debug("failed!"); } return $result; } sub _attribute_value_of { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $variables ) = @_; foreach my $attribute ( $xsl_node->getAttributes->getValues ) { my $value = $attribute->getValue; study($value); #$value =~ s/(\*|\$|\@|\&|\?|\+|\\)/\\$1/g; $value =~ s/(\*|\?|\+)/\\$1/g; study($value); while ( $value =~ /\G[^\\]?\{(.*?[^\\]?)\}/ ) { my $node = $self->_get_node_set( $1, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); if (@$node) { $self->_indent(); my $text = $self->__string__( $$node[0] ); $self->_outdent(); $value =~ s/(\G[^\\]?)\{(.*?)[^\\]?\}/$1$text/; } else { $value =~ s/(\G[^\\]?)\{(.*?)[^\\]?\}/$1/; } } #$value =~ s/\\(\*|\$|\@|\&|\?|\+|\\)/$1/g; $value =~ s/\\(\*|\?|\+)/$1/g; $value =~ s/\\(\{|\})/$1/g; $attribute->setValue($value); } } sub _processing_instruction { my ( $self, $xsl_node, $current_result_node, $variables, $oldvariables ) = @_; my $new_PI_name = $xsl_node->getAttribute('name'); if ( $new_PI_name eq "xml" ) { $self->warn( "<" . $self->xsl_ns() . "processing-instruction> may not be used to create XML" ); $self->warn( "declaration. Use <" . $self->xsl_ns() . "output> instead..." ); } elsif ($new_PI_name) { my $text = $self->__string__($xsl_node); my $new_PI = $self->xml_document() ->createProcessingInstruction( $new_PI_name, $text ); if ($new_PI) { $self->_move_node( $new_PI, $current_result_node ); } } else { $self->warn( q{Expected attribute "name" in <} . $self->xsl_ns() . "processing-instruction> !" ); } } sub _process_with_params { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $variables, $params ) = @_; my @params = $xsl_node->getElementsByTagName( $self->xsl_ns() . "with-param" ); foreach my $param (@params) { my $varname = $param->getAttribute('name'); if ($varname) { my $value = $param->getAttribute('select'); if ( !$value ) { # process content as template $value = $self->xml_document()->createDocumentFragment; $self->_evaluate_template( $param, $current_xml_node, $current_xml_selection_path, $value, $variables, {} ); $$params{$varname} = $value; } else { # *** FIXME - should evaluate this as an expression! $$params{$varname} = $value; } } else { $self->warn( q{Expected attribute "name" in <} . $self->xsl_ns() . q{with-param> !} ); } } } sub _call_template { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $params = {}; my $newvariables = defined $variables ? {%$variables} : {}; my $name = $xsl_node->getAttribute('name'); if ($name) { $self->debug(qq{calling template named "$name"}); $self->_process_with_params( $xsl_node, $current_xml_node, $current_xml_selection_path, $variables, $params ); $self->_indent(); my $template = $self->_match_template( "name", $name, 0, '' ); if ($template) { $self->_evaluate_template( $template, $current_xml_node, $current_xml_selection_path, $current_result_node, $newvariables, $params ); } else { $self->warn("no template named $name found!"); } $self->_outdent(); } else { $self->warn( q{Expected attribute "name" in <} . $self->xsl_ns() . q{call-template/>} ); } } sub _choose { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->debug("evaluating choose:"); $self->_indent(); my $notdone = "true"; my $testwhen = "active"; foreach my $child ( $xsl_node->getElementsByTagName( '*', 0 ) ) { if ( $notdone && $testwhen && ( $child->getTagName eq $self->xsl_ns() . "when" ) ) { my $test = $child->getAttribute('test'); if ($test) { my $test_succeeds = $self->_evaluate_test( $test, $current_xml_node, $current_xml_selection_path, $variables ); if ($test_succeeds) { $self->_evaluate_template( $child, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); $testwhen = ""; $notdone = ""; } } else { $self->warn( q{expected attribute "test" in <} . $self->xsl_ns() . q{when>} ); } } elsif ( $notdone && ( $child->getTagName eq $self->xsl_ns() . "otherwise" ) ) { $self->_evaluate_template( $child, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); $notdone = ""; } } if ($notdone) { $self->debug("nothing done!"); } $self->_outdent(); } sub _if { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->debug("evaluating if:"); $self->_indent(); my $test = $xsl_node->getAttribute('test'); if ($test) { my $test_succeeds = $self->_evaluate_test( $test, $current_xml_node, $current_xml_selection_path, $variables ); if ($test_succeeds) { $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ); } } else { $self->warn( q{expected attribute "test" in <} . $self->xsl_ns() . q{if>} ); } $self->_outdent(); } sub __evaluate_test__ { my ( $self, $test, $path, $node, $variables ) = @_; my $tagname = eval { $node->getTagName() } || ''; my ( $content, $test_cond, $expval, $lhs ); $self->debug(qq{testing with "$test" and $tagname}); if ($test =~ /^\s*(\S+?)\s*(<=|>=|!=|<|>|=)\s*['"]?([^'"]*?)['"]?\s*$/) { $lhs = $1; $test_cond = $2; $expval = $3; } $self->debug("Test LHS: $lhs"); if ( $lhs =~ /^\@([\w\.\:\-]+)$/ ) { $self ->debug("Attribute: $1"); $content = $node->getAttribute($1); } elsif ( $lhs =~ /^([\w\.\:\-]+)$/ ) { $self ->debug("Path: $1"); my $test_path = $1; my $nodeset = $self->_get_node_set( $test_path, $self->xml_document(), $path, $node, $variables ); return ( $expval ne '' ) unless @$nodeset; $content = &__string__( $self, $$nodeset[0] ); } else { $self->debug("no match for test"); return ""; } my $numeric = ($content =~ /^\d+$/ && $expval =~ /^\d+$/ ? 1 : 0); $self->debug("evaluating $content $test $expval"); $test_cond =~ s/\s+//g; if ( $test_cond eq '!=' ) { return $numeric ? $content != $expval : $content ne $expval; } elsif ( $test_cond eq '=' ) { return $numeric ? $content == $expval : $content eq $expval; } elsif ( $test_cond eq '<' ) { return $numeric ? $content < $expval : $content lt $expval; } elsif ( $test_cond eq '>' ) { return $numeric ? $content > $expval : $content gt $expval; } elsif ( $test_cond eq '>=' ) { return $numeric ? $content >= $expval : $content ge $expval; } elsif ( $test_cond eq '<=' ) { return $numeric ? $content <= $expval : $content le $expval; } else { $self->debug("no test matches"); return 0; } } sub _copy_of { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables ) = @_; my $nodelist; my $select = $xsl_node->getAttribute('select'); $self->debug(qq{evaluating copy-of with select "$select":}); $self->_indent(); if ($select) { $nodelist = $self->_get_node_set( $select, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); } else { $self->warn( q{expected attribute "select" in <} . $self->xsl_ns() . q{copy-of>} ); } foreach my $node (@$nodelist) { $self->_add_node( $node, $current_result_node, "deep" ); } $self->_outdent(); } sub _copy { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->debug("evaluating copy:"); $self->_indent(); if ( $current_xml_node->getNodeType == ATTRIBUTE_NODE ) { my $attribute = $current_xml_node->cloneNode(0); $current_result_node->setAttributeNode($attribute); } elsif (( $current_xml_node->getNodeType == COMMENT_NODE ) || ( $current_xml_node->getNodeType == PROCESSING_INSTRUCTION_NODE ) ) { $self->_add_node( $current_xml_node, $current_result_node ); } else { $self->_add_node( $current_xml_node, $current_result_node ); $self->_apply_attribute_set($xsl_node,$current_result_node->getLastChild()); $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node->getLastChild, $variables, $oldvariables ); } $self->_outdent(); } sub _text { #=item addText (text) # #Appends the specified string to the last child if it is a Text node, or else #appends a new Text node (with the specified text.) # #Return Value: the last child if it was a Text node or else the new Text node. my ( $self, $xsl_node, $current_result_node ) = @_; $self->debug("inserting text:"); $self->_indent(); $self->debug("stripping node to text:"); $self->_indent(); my $text = $self->__string__($xsl_node); $self->_outdent(); if ( $text ne '' ) { my $node = $self->xml_document()->createTextNode($text); if ( $xsl_node->getAttribute('disable-output-escaping') eq 'yes' ) { $self->debug("disabling output escaping"); bless $node, 'XML::XSLT::DOM::TextDOE'; } $self->_move_node( $node, $current_result_node ); } else { $self->debug("nothing left.."); } $current_result_node->normalize(); $self->_outdent(); } sub _attribute { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; my $name = $xsl_node->getAttribute('name'); $self->debug(qq{inserting attribute named "$name":}); $self->_indent(); if ($name) { if ( $name =~ /^xmlns:/ ) { $self->debug("Won't create namespace declaration"); } else { my $result = $self->xml_document()->createDocumentFragment; $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $result, $variables, $oldvariables ); $self->_indent(); my $text = $self->fix_attribute_value( $self->__string__($result) ); $self->_outdent(); $current_result_node->setAttribute( $name, $text ); $result->dispose(); } } else { $self->warn( q{expected attribute "name" in <} . $self->xsl_ns() . q{attribute>} ); } $self->_outdent(); } sub _comment { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $oldvariables ) = @_; $self->debug("inserting comment:"); $self->_indent(); my $result = $self->xml_document()->createDocumentFragment; $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $result, $variables, $oldvariables ); $self->_indent(); my $text = $self->__string__($result); $self->_outdent(); $self->_move_node( $self->xml_document()->createComment($text), $current_result_node ); $result->dispose(); $self->_outdent(); } sub _variable { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $params, $is_param ) = @_; my $varname = $xsl_node->getAttribute('name'); if ($varname) { $self->debug("definition of variable \$$varname:"); $self->_indent(); if ( $is_param and exists $$params{$varname} ) { # copy from parent-template $$variables{$varname} = $$params{$varname}; } else { # new variable definition my $value = $xsl_node->getAttribute('select'); if ( !$value ) { #tough case, evaluate content as template $value = $self->xml_document()->createDocumentFragment; $self->_evaluate_template( $xsl_node, $current_xml_node, $current_xml_selection_path, $value, $variables, $params ); } else # either a literal or path { if ( $value =~ /'(.*)'/ ) { $value = $1; } else { my $node = $self->_get_node_set( $value, $self->xml_document(), $current_xml_selection_path, $current_xml_node, $variables ); $value = $self->__string__($node); } } $variables->{$varname} = $value; } $self->_outdent(); } else { $self->warn( q{expected attribute "name" in <} . $self->xsl_ns() . q{param> or <} . $self->xsl_ns() . q{variable>} ); } } # not implemented - but log it and make it go away sub _sort { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $params, $is_param ) = @_; $self->debug("dummy process for sort"); } # Not quite sure how fallback should be implemented as the spec seems a # little vague to me sub _fallback { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $params, $is_param ) = @_; $self->debug("dummy process for fallback"); } # This is a no-op - attribute-sets should not appear within templates and # we have already processed the stylesheet wide ones. sub _attribute_set { my ( $self, $xsl_node, $current_xml_node, $current_xml_selection_path, $current_result_node, $variables, $params, $is_param ) = @_; $self->debug("in _attribute_set"); } sub _indent { my ($self) = @_; $self->{INDENT} += $self->{INDENT_INCR}; } sub _outdent { my ($self) = @_; $self->{INDENT} -= $self->{INDENT_INCR}; } sub fix_attribute_value { my ( $self, $text ) = @_; # The spec say's that there can't be a literal line break in the # attributes value - white space at the beginning or the end is # almost certainly an mistake. $text =~ s/^\s+//g; $text =~ s/\s+$//g; if ($text) { $text =~ s/([\x0A\x0D])/sprintf("\&#%02X;",ord $1)/eg; } return $text; } 1; __DATA__ =head1 SYNOPSIS use XML::XSLT; my $xslt = XML::XSLT->new ($xsl, warnings => 1); $xslt->transform ($xmlfile); print $xslt->toString; $xslt->dispose(); =head1 DESCRIPTION This module implements the W3C's XSLT specification. The goal is full implementation of this spec, but we have not yet achieved that. However, it already works well. See L for the current status of each command. XML::XSLT makes use of XML::DOM and LWP::Simple, while XML::DOM uses XML::Parser. Therefore XML::Parser, XML::DOM and LWP::Simple have to be installed properly for XML::XSLT to run. =head1 Specifying Sources The stylesheets and the documents may be passed as filenames, file handles regular strings, string references or DOM-trees. Functions that require sources (e.g. new), will accept either a named parameter or simply the argument. Either of the following are allowed: my $xslt = XML::XSLT->new($xsl); my $xslt = XML::XSLT->new(Source => $xsl); In documentation, the named parameter `Source' is always shown, but it is never required. =head2 METHODS =over 4 =item new(Source => $xml [, %args]) Returns a new XSLT parser object. Valid flags are: =over 2 =item DOMparser_args Hashref of arguments to pass to the XML::DOM::Parser object's parse method. =item variables Hashref of variables and their values for the stylesheet. =item base Base of URL for file inclusion. =item debug Turn on debugging messages. =item warnings Turn on warning messages. =item indent Starting amount of indention for debug messages. Defaults to 0. =item indent_incr Amount to indent each level of debug message. Defaults to 1. =back =item open_xml(Source => $xml [, %args]) Gives the XSLT object new XML to process. Returns an XML::DOM object corresponding to the XML. =over 4 =item base The base URL to use for opening documents. =item parser_args Arguments to pase to the parser. =back =item open_xsl(Source => $xml, [, %args]) Gives the XSLT object a new stylesheet to use in processing XML. Returns an XML::DOM object corresponding to the stylesheet. Any arguments present are passed to the XML::DOM::Parser. =over 4 =item base The base URL to use for opening documents. =item parser_args Arguments to pase to the parser. =back =item process(%variables) Processes the previously loaded XML through the stylesheet using the variables set in the argument. =item transform(Source => $xml [, %args]) Processes the given XML through the stylesheet. Returns an XML::DOM object corresponding to the transformed XML. Any arguments present are passed to the XML::DOM::Parser. =item serve(Source => $xml [, %args]) Processes the given XML through the stylesheet. Returns a string containg the result. Example: use XML::XSLT qw(serve); $xslt = XML::XSLT->new($xsl); print $xslt->serve $xml; =over 4 =item http_headers If true, then prepends the appropriate HTTP headers (e.g. Content-Type, Content-Length); Defaults to true. =item xml_declaration If true, then the result contains the appropriate header. Defaults to true. =item xml_version The version of the XML. Defaults to 1.0. =item doctype The type of DOCTYPE this document is. Defaults to SYSTEM. =back =item toString Returns the result of transforming the XML with the stylesheet as a string. =item to_dom Returns the result of transforming the XML with the stylesheet as an XML::DOM object. =item media_type Returns the media type (aka mime type) of the object. =item dispose Executes the C method on each XML::DOM object. =back =head1 XML::XSLT Commands =over 4 =item xsl:apply-imports no Not supported yet. =item xsl:apply-templates limited Attribute 'select' is supported to the same extent as xsl:value-of supports path selections. Not supported yet: - attribute 'mode' - xsl:sort and xsl:with-param in content =item xsl:attribute partially Adds an attribute named to the value of the attribute 'name' and as value the stringified content-template. Not supported yet: - attribute 'namespace' =item xsl:attribute-set yes Partially =item xsl:call-template yes Takes attribute 'name' which selects xsl:template's by name. Weak support: - xsl:with-param (select attrib not supported) Not supported yet: - xsl:sort =item xsl:choose yes Tests sequentially all xsl:whens until one succeeds or until an xsl:otherwise is found. Limited test support, see xsl:when =item xsl:comment yes Supported. =item xsl:copy partially =item xsl:copy-of limited Attribute 'select' functions as well as with xsl:value-of =item xsl:decimal-format no Not supported yet. =item xsl:element yes =item xsl:fallback no Not supported yet. =item xsl:for-each limited Attribute 'select' functions as well as with xsl:value-of Not supported yet: - xsl:sort in content =item xsl:if limited Identical to xsl:when, but outside xsl:choose context. =item xsl:import no Not supported yet. =item xsl:include yes Takes attribute href, which can be relative-local, absolute-local as well as an URL (preceded by identifier http:). =item xsl:key no Not supported yet. =item xsl:message no Not supported yet. =item xsl:namespace-alias no Not supported yet. =item xsl:number no Not supported yet. =item xsl:otherwise yes Supported. =item xsl:output limited Only the initial xsl:output element is used. The "text" output method is not supported, but shouldn't be difficult to implement. Only the "doctype-public", "doctype-system", "omit-xml-declaration", "method", and "encoding" attributes have any support. =item xsl:param experimental Synonym for xsl:variable (currently). See xsl:variable for support. =item xsl:preserve-space no Not supported yet. Whitespace is always preserved. =item xsl:processing-instruction yes Supported. =item xsl:sort no Not supported yet. =item xsl:strip-space no Not supported yet. No whitespace is stripped. =item xsl:stylesheet limited Minor namespace support: other namespace than 'xsl:' for xsl-commands is allowed if xmlns-attribute is present. xmlns URL is verified. Other attributes are ignored. =item xsl:template limited Attribute 'name' and 'match' are supported to minor extend. ('name' must match exactly and 'match' must match with full path or no path) Not supported yet: - attributes 'priority' and 'mode' =item xsl:text yes Supported. =item xsl:transform limited Synonym for xsl:stylesheet =item xsl:value-of limited Inserts attribute or element values. Limited support: and combinations of these. Not supported yet: - attribute 'disable-output-escaping' =item xsl:variable partial or from literal text in the stylesheet. =item xsl:when limited Only inside xsl:choose. Limited test support: path is supported to the same extend as with xsl:value-of =item xsl:with-param experimental It is currently not functioning. (or is it?) =back =head1 SUPPORT General information, bug reporting tools, the latest version, mailing lists, etc. can be found at the XML::XSLT homepage: http://xmlxslt.sourceforge.net/ =head1 DEPRECATIONS Methods and interfaces from previous versions that are not documented in this version are deprecated. Each of these deprecations can still be used but will produce a warning when the deprecation is first used. You can use the old interfaces without warnings by passing C the flag C. Example: $parser = XML::XSLT->new($xsl, "FILE", use_deprecated => 1); The deprecated methods will disappear by the time a 1.0 release is made. The deprecated methods are : =over 2 =item output_string use toString instead =item result_string use toString instead =item output use toString instead =item result use toString instead =item result_mime_type use media_type instead =item output_mime_type use media_type instead =item result_tree use to_dom instead =item output_tree use to_dom instead =item transform_document use transform instead =item process_project use process instead =item open_project use C argument to B and B instead. =item print_output use B instead. =back =head1 BUGS Yes. =head1 HISTORY Geert Josten and Egon Willighagen developed and maintained XML::XSLT up to version 0.22. At that point, Mark Hershberger started moving the project to Sourceforge and began working on it with Bron Gondwana. =head1 LICENCE Copyright (c) 1999 Geert Josten & Egon Willighagen. All Rights Reserverd. This module is free software, and may be distributed under the same terms and conditions as Perl. =head1 AUTHORS Geert Josten Egon Willighagen Mark A. Hershberger Bron Gondwana Jonathan Stowe =head1 SEE ALSO L, L, L =cut Filename: $RCSfile: XSLT.pm,v $ Revision: $Revision: 1.25 $ Label: $Name: $ Last Chg: $Author: gellyfish $ On: $Date: 2004/02/19 08:38:40 $ RCS ID: $Id: XSLT.pm,v 1.25 2004/02/19 08:38:40 gellyfish Exp $ Path: $Source: /cvsroot/xmlxslt/XML-XSLT/lib/XML/XSLT.pm,v $