PXTL

Python XML Templating Language
Tutorial

Contents:

Introduction: XML and HTML; HTML issues in PXTL
Starting out: Running the template
Processing instructions: Output PIs; Pseudo-PIs; pxtl.write
PXTL elements: Conditionals; Iterators; Subtemplates; Imports
PXTL attributes: Conditionals; Alterations; Whitespace; Root attributes

Introduction

PXTL is an XML document type allowing embedded Python code to control the generation of transformed document content.

It can be used for web page templating, in a style similar to existing templating languages such as PHP, JSP and ASP. Unlike these languages it is purely XML-based.

The example templates in this tutorial are all XHTML, because XHTML is a vocabulary well-known to most XML authors. However, any XML document type can be produced, as well as legacy SGML-based HTML and, with slightly less convenience, any other text-based document type.

HTML and XML

If you are currently writing ‘old-style’ HTML, you’ll need to learn XHTML before you can use PXTL. You can have PXTL output pages in legacy-HTML if you want, but you still have to write the source templates in XHTML.

If you know HTML, XHTML is a pretty simple change. Here’s the summary:

All XHTML element and attribute names are in lower case.
All elements opened must be closed. HTML used to let you get away with not writing an end-tag for some elements; this doesn’t work any more. So for example you must use <p>...</p> to surround each paragraph instead of writing <p> between paragraphs.
Elements that are always empty, like <br>, still have to be closed. You don’t have to write an end-tag, though, you can just add a slash to the end of the start-tag — <br />.
All attributes must have values in quotes. Attributes that didn’t take values in HTML have to be given the same value as their name, for example, <input type="checkbox" checked="checked" />.
Embedded scripts and stylesheets aren’t ‘magic’, so they require some care if they have ‘<’ or ‘&’ characters in them.
The encoding you need to save as if you’re using non-ASCII characters is UTF-8 (or UTF-16), unless you specify it with an XML declaration (<?xml version="1.0" encoding="...">).

Most importantly, though, XHTML documents must be ‘well-formed’: If you leave a tag open, or forget a quotation mark, or various other serious syntax mistakes, XML tools like PXTL will complain straight away rather than quietly guessing what you probably meant, as browsers do with old-style HTML.

XHTML issues in PXTL

There are two ways to specify the language a document is in, xml:lang and lang, and which one(s) should be used depends on the document type. If you want to specify a language in a PXTL template, just use xml:lang on its own and PXTL will work out the right code to output.

If you include embedded scripts and stylesheets using the <script> and <style> elements, don’t use the old-style HTML hiding trick of putting the content in comments! If you need to hide scripts/styles from ancient browsers, or you want to include ‘<’ or ‘&’ characters in the script, use a CDATA block instead:

<script type="text/javascript"><![CDATA[ window.alert('fish&chips'); ]]></script>

PXTL will work out the appropriate code to write for the type of document being written.

Starting out

Let’s write a basic (pretty pointless) template that will output some XHTML. Start with the root element, which is as usual <html>:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:px="http://www.doxdesk.com/pxtl" >

The xmlns attribute you may be familiar with. It just tells us that any normal elements we use in this document are going to be XHTML ones. You don’t necessarily need to say this, as web browsers can work that out for themselves, but it’s a good idea when you’re writing mixed XML documents.

The xmlns:px attribute is very important, though. It specifies that if we start an element or attribute name with “px:”, it is a command for PXTL. All the examples here assume the PXTL namespace is bound to ‘px:’ like this.

Continuing the template:

<?px fruits= ['orange', 'lime', 'sofa'] ?>

In case you haven’t met one before, this is an XML processing instruction. It starts with <?, then a ‘target name’, which specifies what kind of processing instruction it is, and ends with ?>.

The target name px (or px_code) is used by PXTL to indicate a line of Python code. Here, it is used to assign a list to the variable fruits.

<head> <title> I have <?_ len(fruits) ?> fruits </title> </head>

PXTL uses the processing instruction target name _, also known as px_text, to evaluate a Python expression and add the results to the document as plain text.

<body> <px:for item="fruit" in="fruits"> <p> Fruit: <?_ fruit ?> </p> </px:for> </body> </html>

The px:for element loops over its contents repeatedly just like the Python for statement, with the variable specified by item being given each value from in.

Running the template

Exactly how you invoke a PXTL implementation is not specified by the language itself, but for example with the pxtl package you might save the above template as template.px and write process.py in the same directory:

import pxtl pxtl.processFile('template.px')

Running this would give you:

<html xmlns="http://www.w3.org/1999/xhtml"> <head> <title> I have 3 fruits </title> </head> <body> <p> Fruit: orange </p> <p> Fruit: lime </p> <p> Fruit: sofa </p> </body> </html>

You can also pass global variables into the template for code inside it to access. If you write:

pxtl.processFile('template.px', globals= {'spam': 2})

then all code in template.px will see a global variable called spam with an initial value of 2. This is the usual way of passing parameters into a template.

Processing instructions

We’ve already used a px_code PI (or rather, just px, which is a shortcut for the same thing) to execute a line of Python at one point in a template:

<?px fruits= ['orange', 'lime', 'sofa'] ?>

To execute more than one line of Python at once, we have to start the code part of the PI (that is, after the space) with a colon and new line:

<?px : fruits= ['orange', 'lime'] fruits.append('sofa') # FIXME: not a fruit ?>

Any Python code can go in a code block, including functions, classes and imports, except for the global directive, isn’t really a proper ‘statement’ in Python terms so can't be guaranteed to be supported completely accurately.

The code has to finish by the end of the code block; you can’t leave a structure (for example an if block) open at the end of a block like you do in ASP or JSP to conditionally include some of the document. We use PXTL elements for that instead — more on that later.

Another special processing instruction target provided by PXTL is px_note (or __ for short). This PI, uniquely, contains no actual Python code, because it’s just a comment. Unlike normal XML <-- comments -->, however, these comments aren’t included in the final document.

<-- link to fruit catalogue page --> <?__ FIXME: goes to wrong page ?> <a href="catalogue.html">more...</a>

Output PIs

The rest of the processing instructions are for including content from Python in the final document. We’ve already seen px_text, also known as _:

<p> Fruit: <?_ fruit ?> </p>

This PI will automatically encode any special characters in the value to be output, so you don’t have to worry the ampersand in a fruit called “apples&pears”, for example. Sometimes, though, you may want to write markup directly without any encoding. The PI to do this is px_mark:

Comment: <?px_mark comment.replace('\n', '<br />') ?>

(Be very careful with px_mark, as outputting markup directly can cause unforeseen problems. In the above example, a ‘<’ or ‘&’ character in comment will result in broken markup, and should have been checked for. If user input is included in web pages this way, you’ve got a potential ‘cross-site scripting’ security risk. This is an extremely common problem affecting web applications.)

There are other PI targets available which encode the output in different ways. The most commonly used is px_upar. This encodes the given value in such a way that it can be included in a parameter inside a URI query string.

<p> Google search URL for <?_ term ?>: http://www.google.com/search?q=<?px_upar term ?> </p>

There’s also px_jstr and px_cstr. These are for including text in string literals in JavaScript and CSS. This works for code in HTML <script> and <style> blocks as well as style and event (onclick etc.) attributes.

<?px message= 'Hello '+name ?> <script type="text/javascript"> window.status= 'annoying message: <?px_jstr message ?>'; </script>

The final PI is px_name. This is needed — rarely — when an arbitrary string has to be used as an element’s ID, because IDs (and other things that are XML Name tokens) have extra restrictions about what characters can be in them.

Pseudo-PIs

In XML, processing instructions can only be used in element content. However, we often want to put Python code inside attribute values, and PIs can’t go there.

Instead, PXTL allows ‘pseudo-PIs’ in attribute values. They look and work just like PIs, except they use curly brackets instead of angle brackets.

<a title="Google search for {?_ term ?}" href="http://www.google.com/search?q={?px_upar term ?}" > Find <?_ term ?> </a>

You can use most of the same target names in pseudo-PIs as normal PIs: px_text, px_note, px_upar, px_jstr, px_cstr and px_name, plus the shortcuts _ and __. But px_code is not available, for tedious technical reasons (plus you probably wouldn’t want it in attributes anyway).

Additionally there is one further target name that can only be used in pseudo-PIs. The content of px_if is a conditional expression. This must evaluate true, or the attribute is removed completely.

<option value="m" selected="selected{?px_if user.sex=='m' ?}"> Male </option>

There’s one catch when working with pseudo-PIs (and, for that matter, all Python code in attributes, which happens a lot with PXTL elements and attributes). It is this: attribute values cannot contain unescaped ampersands, less-than symbols or quotes, so you can’t use those characters in your code.

You can get around this by using &, < and " respectively when you write Python inside attribute values, but that can make the Python code quite tricky to read, so it’s usually better to avoid it (or, at worst, move the problem code into a ‘real’ processing instruction).

The ampersand is very rare in Python anyway; if you really need the bitwise-or operator, operator.or_(a, b) is an alternative way of writing it. Less-than can be avoided by switching the comparison around and using greater-than instead. And quotes are rarely a problem in practice as both XML and Python allow you to mix-and-match single- and double-quotes.

pxtl.write

When any PXTL template is run, the code inside it is given an extra global variable called pxtl. This is an object which contains a few useful shortcuts for template code, and the function write.

pxtl.write can only be called inside a px_code (px) processing instruction. It puts content into the document at the place the PI was:

<div class="userdetails"> <?px : user= users.get(username, None) if user is None: user= DEFAULT_USER if DEBUG: pxtl.write('Debug: user not found') ?> Info: <?_ user.details ?> </div>

pxtl.write takes an optional second argument, coding, which indicates what sort of encoding to use on the value; this can be 'text', 'mark', 'upar', 'jstr', 'cstr' or 'name', with the same effect as the PIs of the same name. If omitted, this defaults to 'text'.

PXTL elements

Controlling template flow control from Python code is done with elements in the PXTL namespace (which we have bound to the prefix ‘px:’).

Conditional elements

The simplest is if, which evaluates its test attribute, and includes its contents in the output document if the condition is true. Otherwise, its entire contents are discarded, without running any Python code that might be in them.

<px:if test="hasChildren"> <p class="warning"> <em>Warning</em>: has child items, really delete? </p> </px:if>

There are also else and elif elements, which work like the Python statements of the same name:

<px:if test="user is None"> <em>(not logged in)</em> </px:if> <px:elif test="user.isAdmin"> (Administrator) </px:if> <px:else> (Normal user) </px:else>

An elif or else element must be the next element sibling of another conditional element: there can be some text or a comment or PI in between them, but not another element. You can of course put conditional elements inside other elements, where they work without affecting their parent elements — it’s the same as nesting if statements in Python.

For convenience, PXTL also provides two further conditional elements that aren’t from Python, anif and orif. These can make some kinds of structure easier to write without having to resort to temporary variables (or exceptions, which you can’t really use in templates anyway).

An anif element is included in the document if its test attribute evaluates true and the previous element was included. The main advantage this has over nesting if clauses inside each other is that you can write an else (or elif) clause afterwards which will work if any of the previous clauses were not included. So in this example, the text ‘(anonymous)’ will appear if the user object is None or has no realname:

<px:if test="user is not None"> <?_ user.username ?> </px:if> <px:anif test="user.realname!=''"> <em><?_ user.realname ?></em> </px:anif> <px:else> <em>(anonymous) </em> </px:else>

An orif element’s contents are included if the previous element was included or the test attribute evaluates true, so you can let one successful conditional element ‘fall through’ to the next.

You can combine elif, anif and orif elements in any order (see the language specification for the exact rules). It’s probably best not to use them all together, though, as the resulting code could be quite confusing.

Iterator elements

The next-most-common elements are iterators, which include their content repeatedly. We’ve already seen the basic use of px:for:

<px:for item="fruit" in="fruits"> <p> Fruit: <?_ fruit ?> </p> </px:for>

The for element also has an optional attribute index, which takes a variable (or anything else you can assign to, for that matter) to hold the loop index. This starts at 0 the first time around the loop. This example would shade every other table row:

<px:for item="product" in="getAllProducts()" index="ix"> <tr class="{?_ ['white', 'shaded'][ix % 2] ?}row"> <td><?_ product.name ?></td> <td><?_ product.price ?></td> </tr> </px:for>

Instead of specifying an in list, you can choose to give a range. This attribute takes a list of parameters in the same form as those passed to the Python built-in range function, and is just a shortcut for in="range(...)". Also the item attribute is optional, so if you want to loop a certain number of times, you can write:

<px:for range="nStars"> <img alt="*" src="/img/star.gif" /> </px:for>

The other iterator element is while. As with the Python statement of the same name, it executes its contents until its condition — specified in the test attribute — evaluates false.

breadcrumb trail: <px:while test="page is not None"> <a href="{?_ page.url ?}"> <?_ page.name ?> </a>, <?px page= page.parent ?> </px:while>

Like px:for, an optional index attribute gives a variable to write the iteration number to.

There is also an optional min attribute, which evaluates to a number, a minimum number of iterations to run before checking the test attribute. This is 0 by default so the condition is checked before every loop, but setting it to 1 gives you the equivalent of a post-test loop. (This is available in many languages under the name do...while; in Python one usually writes while True:...if not condition: break, but break is not available in templates.)

There’s one more trick with the iterator elements: they also count as conditional elements, which are considered ‘successful’ if their body executes at least once. So you can put an element like px:else after px:for or px:while, and the else-clause will get run if the loop iterated zero times. (Note: you can put an else after a loop in Python too, but that means a subtly different thing to do with break, which you don’t get in PXTL.)

Subtemplates

A subtemplate is like a Python function. You define it with a px:def element, then its contents are executed every time you call it, with px:call. The name of the variable to use for the subtemplate function is specified in the fn attribute.

<px:def fn="header"> <tr><th> User </th> <th> E-mail </th> <th> Joined </th> </tr> </px:def> <table id="onlineusers"> <px:call fn="header" /> ...</table> <table id="offlineusers"> <px:call fn="header" /> ...</table>

Arguments can be passed to the subtemplate function using args attributes, in the same form as argument lists in functions.

Subtemplates can do some powerful things that most templating systems make very difficult. Here’s an example of a recursive subtemplate used to print a hierarchy of comments in a forum:

<px:def fn="writeComments" args="parentComment"> <px:for item="comment" in="parentComment.children"> <h3> <?_ comment.title ?> </h3> <p> <?_ comment.body ?> </p> <div class="replies"> <px:call fn="writeComments" args="comment" /> </div> </px:for> </px:def> <h2> Forum topic: <?_ topicRoot.title ?> </h2> <px:call fn="writeComments" args="topicRoot" />

You can define subtemplates inside other subtemplates, with the same effect as nesting functions in Python. You’ll get nested scopes over subtemplates if the version of Python you’re using would do nested scopes over functions.

Imports

The px:import element can be used to include content from an external document, referenced by the src attribute. This is a URI, not a filename, so Windows users should take care to use ‘/’ not ‘\’, and give absolute filenames in the form Python likes, ‘file:///C|/dir/filename.px’. Relative URIs are taken as relative to the current template; URI schemes other than ‘file:’ are not guaranteed to be supported.

Like all PXTL attributes, src and type contain Python expressions, so if you want a simple string remember to put quotes around it!

<px:import src="'includes/header.px'" type="'text/x.pxtl+xml'">

The type attribute is optional; if you don’t include it PXTL will work it out from the type of the document itself. (In most filesystems, this is done by looking at the file extension.) The type dictates what sort of import will happen:

if it’s ‘text/plain’ (.txt), the file will be included as plain text, with special characters suitably encoded for XML;
if it’s an XML or HTML type — ‘text/html’, ‘application/xml’ and so on (.html etc.) — the file will be included directly in the output, with markup as-is;
if it’s PXTL — ‘text/x.pxtl+xml’ (.px), or any other unknown type — the file is transformed as a PXTL template, and the results included.

When importing a PXTL template, a couple of other optional attributes may be used. Firstly, a globals attribute specifies a dictionary to use as global scope for the imported template. This is just like passing variables into the original template with the globals parameter to the processFile function of the reference implementation.

Secondly, the as attribute names a variable to assign a module-like object representing the imported document. All the global variables defined by the template after it finished running will be accessible as properties of this object; this means an imported template can define subtemplate functions for an importer template to use. For example, one could write an importable template that consisted of nothing but subtemplate definitions for page parts:

<px:none xmlns:px="http://www.doxdesk.com/pxtl"> <px:def fn="header" args="title"> <head> <title> MySite: <?_ title ?> </title> <script src="/script/navpop.js" type="text/javascript"></script> <link rel="stylesheet" href="../style/site.css" type="text/css" /> </head> </px:def> <px:def fn="footer"> <address> © 2003 My Example Site Inc. </address> </px:def> </px:none>

The px:none element is a special element that PXTL will always remove from the final output. It’s quite common to use px:none as the root element of a template to be imported: XML requires a single root element, but an imported template might want to output any number of elements — in this example, it outputs none, merely defining subtemplates for later use.

We could import and use the above template like this:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:px="http://www.doxdesk.com/pxtl" > <px:import src="'pageparts.px'" as="pageparts"/> <px:call fn="pageparts.header" args="'Page title'" /> <body> ... <px:call fn="pageparts.footer" /> </body> </html>

PXTL attributes

As well as elements in the PXTL namespace, there are attributes from the PXTL namespace, which you can use on any other elements to alter them.

Conditional attributes

The conditionals px:if, px:elif, px:anif, px:orif and px:else are also available as attributes. But they work slightly differently when you include a PXTL conditional attribute on an element:

<a href="index.html" px:if="currentPage!='home'"> <img src="/img/nav/home.gif" alt="home" /> </a>

The contents of an element that has a PXTL conditional attribute will always be included in the document. However, if the attribute ends up with an unsuccessful result (ie. a px:if attribute’s value evaluates to false), the element itself is removed from the document, and replaced by its children.

So, in the above example, there will always be a ‘home’ image. However, only if the currentPage variable is something other than 'home' will the image be in a clickable link.

When using px:else, px:elif, px:anif and px:orif attributes, the ‘previous clauses’ they look at are parent elements with conditional attributes, rather than earlier siblings. So with conditional attributes, the if...else clauses are nested rather than one after another as with the elements.

<ul px:if="isCountable"> <ol px:else=""> <li> item </li> </ol> </ul>

In this example, the list item will be contained in either an ordered or unordered list, depending on the value of isCountable. Else-clauses do not need conditions, so any contents put in the value of a px:else attribute are ignored.

Alteration attributes

Elements can be altered in-place with the px:tagname and px:attr attributes.

The px:tagname attribute simply changes an element’s name to an evaluated Python string expression. This can include a namespace prefix if you like.

<h1 px:tagname="'h'+str(headLevel)">Info</h1>

In this example the h1 tag name is always overwritten by the results of the expression, eg. ‘h3’.

Similarly, px:attr can be used to add, change or remove any attribute on the element. Its value should be a mapping from attribute name string (possibly including a namespace prefix) to either None, in which case no attribute is added, and any existing attribute of the same name is removed, or a value that will be used as the new string value for that attribute.

The following example writes an anchor that should still work in very old browsers that don’t understand the id attribute.

<?px : anchorAttr= 'id' if browserinfo.getHtmlLevel()<4: anchorAttr= 'name' ?> <a px:attr="{anchorAttr: 'foo'}">About foo</a>

Whitespace control

Sometimes excess whitespace is a problem. There are places where two tags must be written with no space in between; unfortunately, writing documents with no whitespace can lead to code that is difficult to read.

For this case, PXTL offers the px:space attribute. It can be specified on any element, even special PXTL elements. If the attribute has a true value, all whitespace in the element will be preserved. If a false value is used, any whitespace that isn’t part of some stretch of non-whitespace text will be removed. If the attribute is not specified, or has the value None, whitespace settings are the same as the parent element; by default, the root element preserves whitespace.

A common use in HTML is to line images up together with no gaps between them:

<px:for item="page" in="navigation" px:space="0"> <a href="/{?_ page.id ?}.html"> <img alt="{?_ page.title ?}" src="/img/nav/{?_ page.id ?}.gif" /> </a> </px:for>

Root attributes

There are two attributes that can only be used on the root element of a document: px:future and px:doctype. Additionally, px:doctype only actually has any effect when it’s on the root element of the main template being processed; it is ignored when seen in imported templates.

The px:doctype attribute allows a document type declaration to be attached to the final output document, so that it can be validated. (In the case of HTML, giving the right <!DOCTYPE> can also affect the way some browsers do page layout.)

The attribute value for px:doctype should evaluate to a tuple with up to four items. (See the Language specification for full details.) Normally, though, it is easier to use the shortcut values built into the global pxtl object.

Use px:doctype="pxtl.XHTML1S" to write XHTML 1.0 Strict documents, pxtl.HTML4T for HTML 4.01 Transitional, and so on.

The px:doctype attribute also specifies the MIME media type of the output document, and chooses the method of markup encoding — this can be normal XML, HTML-compatible XHTML, real old-style HTML, or no markup at all. This can be used to make templates for languages that are not XML-like, for example stylesheets:

<css xmlns:px="http://www.doxdesk.com/pxtl" px:doctype="pxtl.CSS" > <?px myred= 'rgb(180, 40, 20)' ?> .warning { color: <?_ myred ?>; border: dotted 3px <?_ myred ?>; } </css>

The px:future attribute allows access to Python future-features. In normal Python scripts you do this with the from __future__ import construct, but that’s not possible in a page template because the future-import can’t go right at the top of the script. Instead, put a list of desired features (in the same format as future-imports) in the px:future attribute on the root.

<html xmlns:px="http://www.doxdesk.com/pxtl" px:future="nested_scopes" ><body> <px:def fn="outer"> <?px scope= 'Nested' ?> <px:def fn="inner"> Scope in outer function: <?_ scope ?> </px:def> <px:call fn="inner" /> </px:def> <px:call fn="outer" /> </body></html>

The above template makes use of the nested_scopes feature, which is on by default in Python 2.2 and later. In Python 2.1 the px:future attribute ensures nested scopes are turned on, both for normal Python functions and for subtemplates. In Python 2.0 it will generate an error, as no support for nested scopes is available.

Finally...

For any further technical questions, see the Language specification. Print out the Quick reference to keep an overview of the available features. Check out the documentation for the pxtl implementation package to see how to interface it with your software.

If that doesn’t cover it, e-mail the author!