PXTL

the Python XML Templating Language -
a proposal and specification in progress

Introduction

This is the first draft of the spec for PXTL. It is incomplete and may contain errors as well as things likely to be changed before any implementation is arrived at. Please contact me with any comments you have, especially on things marked as issues. -- Andrew Clover

Rationale

Python is a great language for developing web applications, but it lacks a clean strategy for outputting dynamic pages. Interpolating a Python code hierarchy with the output markup hierarchy tends to produce difficult-to-read code; Python's reliance on indenting makes it impossible to format the code to fit within the markup hierarchy.

The work done by the JSR-052 group on templating in Java demonstrates the benefits of a taglib-style approach, using a single code and markup hierarchy to express the design more clearly. It is the aim of PXTL to bring something similar to Python, whilst improving on the design of JSTL where it is inflexible or confusing (often due to the design of JSP/taglibs or Java in general) and working around some of the difficulties inherent in using Python for templating, in particular by providing tags for all flow control operations that would normally require indented code blocks.

Properties

Unlike JSTL, PXTL is a pure-XML-based templating system. Source files, conventionally using the .px suffix, must be well-formed XML; output is in XML, unless the optional legacy-HTML support is enabled. Nesting errors in code using JSTL taglibs can be hard to track down, as the source document is not well-formed and cannot be validated; it is hoped that sticking to well-formed XML will ease debugging.

Proper nesting must be observed: it is impossible, for example, to conditionally output a start-tag separately from its end-tag. Characters must also be escaped according to normal XML rules. For example Python code in an attribute must have any ampersands replaced with & and double-quotes replaced with " if double-quotes are used as the attribute delimiter. Luckily ampersands are rare in Python code and both Python and XML can use single as well as double quotes, so having to escape characters in code should be reasonably infrequent.

It is a further aim that it should usually be theoretically possible to prove that the template will generate a valid output document when run without actually having to execute the Python code in the template. Obviously this will not be possible for all constructs, but for general structural constructs a validator should be able to try each possible route through the template and ensure that each will produce, for example, valid HTML.

PXTL uses elements and attributes from the PXTL namespace for structural directives, and processing instructions for insertion of Python code and data-output into a page. It is also possible to put processing instruction-style constructs into arbitrary attributes, subject to normal character escaping rules for attributes.

PXTL files are intended only for use as an output template. They are not a webapp framework. Using them encourages clean separation of action and view, but does not require it; architecture is left to the author.

Attributes defined by PXTL take their contents as being Python code and execute that directly, at the point at which they are encountered in the document. There is no need for a supplementary "expression language" as in JSTL/JSP as Python does not suffer from the verbosity that makes Java unsuitable for use as an expression language. The other property of the JSTL/JSP EL is that it avoids throwing exceptions when, eg. unknown variables are referenced. This is not considered desirable for PXTL as it hinders debugging.

Example document

This example demonstrates some of the flavour of the language. In this document, PXTL directives are coloured blue, and actual Python code is red. (An XML-aware text editor that can check for well-formedness and syntax-color tags from different namespaces would obviously be an advantage when writing PXTL templates.)

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:px="http://www.doxdesk.com/pxtl">
<head>
  <title>
    <px:if cond="obj!=None"> <?: obj.name ?> </px:if>
    <px:else> Index </px:else>
  </title>
</head><body>
  <px:for var="thing" in="db.getThings()">
    <a href="obj.py?obj=<?px:par thing.id ?>" px:if="obj!=thing">
      Object: <?: thing.name ?>
    </a>
  </px:for>
</body></html>

Output PIs

PXTL offers a range of XML Processing Instructions to execute Python code and output some of its results. Which one to use depends on how you want the output to be processed. The target name used generally begins with 'px:' (this is not a namespace, though it looks like one; PIs do not have namespaces in XML).

PXTL PIs can be inserted in the document content as normal, but they can also be included as part of an attribute value. Technically this is not a PI, as XML does not interpret '<?' as beginning anything special in attribute values. But PXTL looks for these PI-like directives in attributes too so they work just as well. They cannot, however, appear inside markup other than attributes - "<input <?px:text 'checked' ?>>" would not be well-formed XML.

px:text

This target evaluates the Python expression following the target, tries to convert it to a string if it is not already, and writes it to the document, escaping the XML special characters '<' and '&' automatically. Example:

<p> Hello, <?px:text username ?> </p>

Since this is by far the most common PI, it can be referred to using the shortcut target name of just a colon:

<p> Hello again, <?: username ?> </p>

px:code

This target executes a block of Python code. No output is generated. The code can either be a single line, or a newline followed by a block of code. The indenting must be consistent within the block, but need not be the same as any other code in the file.

<?px:code
  (x, y)= foo.getPosition()
  debug('gotPosition')
?>

This PI also has a shortcut, 'px:' -

<?px: (x, y)= foo.getPosition() ?>

px:par

This target evaluates its expression, converts to string where necessary, and encodes each character to be safe for inclusion in a URL query string parameter.

<a href="show.py?id=<?px:par thing.id ?>;display=on">

Issue: should there be a shortcut for this reasonably common type of escaping? XML's limitations on what can be in a target name (eg. not '%') limit the possibilities somewhat.

px:markup

This target evaluates its expression, converts to string and outputs it without any kind of escaping. The string may contain markup, which will be passed straight to the output document.

<?: c= string.replace(comment, '\n', '<br />') ?>
Comment: <?px:markup c ?>

px:jssl

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a JavaScript string literal, which might be embedded in an HTML <script> block. Backslashes and quotes are escaped, along with sequences that could possibly end a CDATA section.

<script type="text/javascript">
  window.alert('Hello, <?px:jssl username ?>');
</script>

px:onjssl

This target evaluates its expression, converts to string and outputs it in a form suitable for inclusion in a JavaScript string literal being used inside an HTML event handler.

<a href="mypage.html" onclick="window.alert('<?px:onjssl username ?> clicked on me!');">

px:comment

This target ignores its contents and outputs nothing. Use it for adding comments to the template that should not be visible in the output. A shortcut target '--' is available; normal usage would be:

<?-- FIXME name should not be outputted for anonymous user ?>
<?: user.name ?>

Structural elements

if

Conditionally include the contents of the element in the output document. This element has one attribute, cond, containing a Python expression, which is evaluated as a boolean. If true, the element's content is included at that point in the document. If false, all content inside the px:if element is discarded; no Python code inside the block is executed.

<a href="message.html">Next message</a>
<px:if cond="user.isAdmin">
  <a href="remove.py">Remove message</a>
</px:if>

else

Include the contents of this element if the previous element's condition evaluated to false. This element must directly follow an element that has a condition, such as <px:if>, with only whitespace allowed between them.

Hello,
<px:if cond="user.name!=''">
  <em><?: user.name ?><em>
</px:if><px:else>
  whoever you are.
</px:else>

(When inside a switch parent, else has a different meaning - see the section on switch for more.)

elif

Include the contents of this element if the previous element evaluated false and the condition specified in cond holds true.

Hello,
<px:if cond="user.sex=='M'">
  Sir
</px:if><px:elif cond="user.sex=='F'">
  Madam
</px:elif><px:else>
  you
</px:else>

anif

Include the contents of this element if the previous element evaluated true and the condition specified in cond holds true.

If followed by an else (or elif) element, the else-part is considered true if either the anif-condition is false, or the anif-condition never gets evaluated due to the preceding if-condition being false. This example code shows how an anif-clause can be used to allow an else-clause to take over if any of the preceding tests failed:

<px:if cond="user!=None">
  <h2> User '<?: user.name ?>' </h2>
</px:if><px:anif cond="user.email!=''">
  <p> Email: <?: user.email ?> </p>
</px:anif><px:else>
  <p> (no e-mail adress known) </p>
</px:else>

An anif element must directly follow an element that uses a condition, such as if or anif. However it may not follow a else or elif element (as this could generate very confusing code).

switch

Chooses one of its child when elements to display the contents of. Has one attribute, on, which gives an expression against which to compare its children's value expressions.

switch may only contain when elements, whitespace, and optionally one else element, which must be the last child. The contents of the first when child to match are displayed. If no child's value matches, any else child inside is displayed.

when

Includes its contents if it is the first of its siblings to have a value attribute that matches the parent's on attribute. Example:

<px:switch on="user.role">
  <px:when value="'A'">
    <strong> Administrator </strong>
  </px:when>
  <px:when value="'M'">
    <em> Moderator </em>
  </px:when>
  <px:else>
    User
  </px:else>
</px:switch>

for

Includes its contents repeatedly, once for each item in a list, tuple or other iterable object. If a var attribute is given, it contains an l-value that will have the item written to it during each iteration. (If this l-value is a new variable, it will be left in scope after the loop has completed, as normal for Python for-loops.)

An index attribute, if given, contains an l-value to which to write the integer index of each iteration.

<px:for var="product" in="getProducts()" index="ix">
  <tr class="<?: ['whiterow', 'shadedrow'][ix%1] ?>">
    <td><?: product.name ?></td>
    <td><?: product.price ?></td>
  </tr>
</px:for>

A range attribute may be used instead of an in attribute. The value(s) in the attribute are used as parameters to a range or xrange object.

A for element is considered conditional in PXTL and so may have an else, elif or anif element following it. The for is considered 'true' if it executed at least once.

Issue: note this is different from the meaning of for-else in Python itself, which is IMHO less useful. But there should not be too much confusion is PXTL does not include a 'break' feature. Would 'break' be useful in a templating language? I can't think of a compelling reason, but...?

Issue: also is there a use for a 'while' loop feature?

Miscellaneous elements

element

Meaningless element. This is only intended to be used in conjunction with the 'tag' attribute, when there is no 'default' tag name that can be used for validation purposes, or if the element could have any tag name at all.

def

Define a template that can be called as a function from Python code. The var attribute contains an l-value to assign the function to. The args attribute contains a list of arguments in the same format as in a Python function declaration.

<px:def var="writeComments" args="parentComment">
  <px:for var="comment" in="parentComment.children">
    <h3><?: comment.title ?></h3>
    <p><?: comment.body ?></p>
    <div class="replies">
      <?px: writeComments(comment) ?>
    </div>
  </px:for>
</px:def>

<?px: writeComments(forumRoot) ?>

Issue: the only other code-block-introducing feature in Python is 'try'. Is there any possible reason one might want exception handling in a templating language?

include

Include the contents of a file. The file attribute contains a pathname relative to the current .px file in the local filesystem. The parse attribute may be set to 'parse' if the file should be treated as a fragment of PXTL source to be included in the template; otherwise the contents are copied directly to the output document (like an SSI).

Conditional attributes

Whereas the structural elements dictate the inclusion or otherwise of their entire contents, the conditional attributes work on a single element alone. If they evaluate false, the element they are a part of is removed from the document tree and all its children are moved to be children of the element's parent.

The conditional attributes work and are named similarly to the conditional elements. However, dependent clauses (else, elif, etc.) are nested inside the element with the if-clause instead of following directly after it.

if

Include the element in the document if the condition evaluates to true.

<a href="index.html" px:if="currentPage!='index'">
  <img src="/img/nav/index.gif" alt="home"/>
</a>

In this example, the 'home' image will be inside a link if the if-clause evaluates true. Otherwise it will be unclickable, but it will still be there.

elif

Include the element in the document if the condition evaluates true and the condition of the nearest ancestor element with an if, elif or anif conditional attribute did not evaluate true.

<h2 px:if="isHeading">
  <p class="mypara" px:elif="true">
    Lorem ipsum dolor sit amet.
  </p>
</h2>

In this example the text will either be in a heading or a paragraph (with class mypara). The attribute px:elif="true" is the idiom for an 'else' condition.

anif

Include the element in the document if the condition evaluates true and the condition of the nearest ancestor element with an if or anif conditional attribute did not evaluate true. As with the conditional element, this attribute may not be used where the nearest ancestor with a conditional attribute has the 'elif' attribute.

<div class="selected" px:if="i in getSelectedIndices()">
  <p>
    <?: items[i].name ?>
    <img src="/img/selected.gif" alt="(tick)" px:anif="true"/>
  </p>
</div>

Alteration attributes

The alteration attributes affect the single elements they are declared on, rather than simply including or not including them like the conditional attributes.

Use of the alteration attributes adds flexibility to perform difficult templating tasks at the cost of not being able to reliably validate the output without running the code.

tag

Specifies an alternative tagname for the element. Should evaluate to a string (which may include a namespace prefix).

<ul class="mylist" px:tag="['ul', 'ol'][isCountable]">
  <li>
    <h2 px:tag="'h'+str(headLevel)">Info</h2>
  </li>
</ul>

attr

Adds or changes arbitrary attributes of an element. The value should evaluate to a dictionary whose keys are attribute names (possibly namespace-prefixed) and whose values are either string values or None. If an attribute name is already present in the element it will be overwritten the value from the dictionary. If that value is None, the attribute will be removed.

<div px:attr="{'class': [None, 'selected'][isSelected]}">
  <a px:attr="{browsercaps.idOrName: 'myanchor'}">Foo</a>
</div>

checked

Because choosing whether HTML checkboxes and radio inputs should have the 'checked' attribute or not is extremely common, this attribute:

<input type="checkbox" px:checked="condition">

is provided as a shorter and more validatable alternative to:

<input type="checkbox" px:attr="{'checked': [None, 'checked'][condition]}">

selected

As checked but for the 'selected' attribute used in HTML 'option' elements.

doctype

The doctype attribute can be used only on the root element of the document. Any doctype given will be inserted in the output file just before the root element. Its value is a tuple of public and system identifier strings (either or both of which may be None; if both are None, no DOCTYPE is output) and an output method.

The output method may be 'xml', 'xhtml' or 'html'. In 'xml' mode, any elements that can only be empty are output in the XML short form (<element/>). CDATA sections are not used. xml:lang attributes are left unchanged.

In 'xhtml' mode, empty elements are output in short form when they are defined as empty in HTML; for compatibility with HTML browsers, a space is inserted before the closing slash of a short-form empty element. xml:lang attributes are copied to lang attributes. Suitably escaped CDATA sections are used for elements defined to contain CDATA in HTML.

In 'html' mode, output is transformed to valid HTML instead of XML. Attributes are minimised where possible. CDATA in elements is suitably comment-hidden. &apos; entity references are converted to &#039;.

PXTL has some built-in constants you can use as shorthand for the full doctype tuple after importing the pxtl module. They are:

name public identifier system identifier output method
pxtl.XML None None 'xml'
pxtl.TAGSOUP '-//W3C//DTD HTML 4.01 Transitional//EN' None 'html'
pxtl.HTML4S '-//W3C//DTD HTML 4.01//EN' 'http://www.w3.org/TR/html4/strict.dtd' 'html'
pxtl.HTML4T '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd' 'html'
pxtl.HTML4F '-//W3C//DTD HTML 4.01 Frameset//EN' 'http://www.w3.org/TR/html4/frameset.dtd' 'html'
pxtl.XHTML1S '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' 'xhtml'
pxtl.XHTML1T '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' 'xhtml'
pxtl.XHTML1F '-//W3C//DTD XHTML 1.0 Frameset//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd' 'xhtml'
pxtl.XHTML1B '-//W3C//DTD XHTML Basic 1.0//EN' 'http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd' 'xhtml'
pxtl.XHTML11 '-//W3C//DTD XHTML 1.1//EN' 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd' 'xml'
pxtl.XHMS '-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN' 'http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd' 'xml'

If no doctype attribute is specified, PXTL looks at the tag name of the root element. If it is 'html' and is in the namespace 'http://www.w3.org/1999/xhtml' or not in any namespace, the settings for 'XHTML1S' are used. Otherwise, the 'XML' settings are used.

Implementation

(writeme.) can we compile a .px to a Python bytecode file for speed? Should be possible.

Issue: what to do about whitespace. Don't want to have to rejig all indenting, but don't want to leave masses of spare whitespace where tags have been removed like in JSP either.