XML Template Specification

The XML template used by Imprint contains the static portions of the text of the final document, along with all the placeholders for dynamically generated content.

There is no DTD or XMLNS for the template, for two reasons. All validation is done internally by the Imprint core, in a manner that is as lenient as possible. Any errors that can be forgiven, will be, with a warning and a logged message. Additionally, it is possible to use the XML Tag API to extend the capabilities of the core processor without requiring modification of a hard-coded standard.

The XML format used by Imprint does not allow namespaces. Namespace tags will be ignored with a warning, even if they are registered through the XML Tag API.

Warning

All tag and attribute names are case-sensitive. All builtin tags and attributes are lowercase. Names must appear in the XML exactly as shown in the spec.

Root

The file root is always the <imprint-template> tag. That being said, there is a proposal to make it configurable: Configurable XML Root Tag.

Attributes

Normally, each tag has a set of required and optional attributes. Omitting a required attribute immediately triggers a fatal error. Omitting an optional attribute just sets the default value when processing. Any extra attributes that are neither required nor optional are logged but otherwise completely ignored. In the tag descriptions below, all attributes are mandatory, unless suffixed by opt for “optional”.

In addition to the normal attributes that any tag may have, there are attributes that are processed by the engine itself. Currently, there is one such attribute:

role

Define the role of a tag and immediately make it referenceable. The role is the name of another tag that is referenceable by design. Among the builtin tags, <figure>, <table>, and sometimes <par> are referenceable by design. For more details on references, see the relevant section in the Tag API.

Normally, referenceable tags identify the target with an id attribute. Defining a role on a custom tag therefore implies that it must also have an id attribute in that case. Among the builtin tags, <segment-ref> is an exception, in that it requires either an id or a title. A tag with role="par" therefore does not require an id attribute. The rules for custom tags are defined similarly: the check for target identification attributes depends on what the role supports.

Tags

<break>

Insert a page-break. If placed in the middle of a run, this will be a true page break. Otherwise, this will be a section break that starts a new page.

Attributes

None

Content

No Content

<expr>

Evaluate a Python expression and create a new keyword. This tag can appear anywhere in the document. It temporarily suspends normal processing. Any text inside this tag will be evaluated as a Python expression, and the result will be assigned to the named keyword. All existing keywords, including those from prior <expr> tags, are available in the evauation namespace.

Keywords computed in this manner are treated the same as User-Defined Keywords and will be effective immediately as soon as the closing tag is reached, but not before. It is therefore common practive to put of all the expressions into the beginning of the XML Template.

The purpose of this tag is to abstract away common boiler-plate keywords that depend entirely on other keywords into the XML Template to avoid as much redundancy as possible.

System Keywords should never be set with this tag. System values may be used before the XML file is read, and may therefore not work as intended for this and other reasons.

Warning

This tag runs arbitrary Python code, with direct access to the keyword definitions. Avoid making assignments within the tag itself (even implicit ones) unless you really know what you are doing!

Warning

Any coding errors in the content of this tag will cause a fatal error.

Attributes

name : Python Identifier
The name of the new keyword to create.
importsopt : List of module names
A space-separated list of modules to import before evaluating the expression in the tag. Failed imports will be logged as an error.

Content

Text Only

<figure>

Generates a figure using the selected handler, and insert it into the document. If Image Logging is enabled, a separate file with the image will be generated as well.

Figures are referenceable through the <figure-ref> tag.

Attributes

id : Python Identifier
The name of the Data Configuration dictionary for the figure. The name must appear in the IDC File file. This is also the ID used by the <figure-ref> tag to link back to this tag.
handler : str
The full name of the figure handler class that will generate the content.
styleopt : Character Style
The name of the style of the run containing the figure. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
pstyleopt : Paragraph Style
The name of the style of the paragraph containing the figure. Must be defined in the DOCX Stub and be a paragraph style.
widthopt : int + {'in', 'px', 'cm', 'mm', 'pt', 'emu'}
The width of the figure. Units are optional, and default to inches ('in'). Suffixes can be separated from the number by optional whitespace.
heightopt : int + {'in', 'px', 'cm', 'mm', 'pt', 'emu'}
The height of the figure. Units are optional, and default to inches ('in'). Suffixes can be separated from the number by optional whitespace.

The attributes handler, style, pstyle, width and height can be overriden by keys with the same name in the Data Configuration for the figure. If neither width nor height are specified, the figure will be inserted as-is. If only one of them is specified, the figure will be scaled proportionally.

Content

No Content

<figure-ref>

Insert a reference to a <figure>, or another tag playing the role of a <figure>.

The reference will look something like Figure 1.2-1, depending on the configured heading depth and separators.

Attributes

id : Python Identifier
The id of the corresponding <figure>.

Content

No Content

<kwd>

Perform a keyword replacement. Keywords are defined as in the IPC File. The entire tag is replaced with the value of the keyword.

Attributes

name : Python Identifier
The name of the keyword to replace.
formatopt : format_spec
A format specification that can be used to convert the value into a string.

Content

No Content

<latex>

Insert a LaTeX formula into the document as an image. This tag is only available if the appropriate dependencies are installed.

Equations interrupt the current run if their run style does not match the style of the current run.

Attributes

styleopt : Character Style
The name of the style of the run containing the equation. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
pstyleopt : Paragraph Style
The name of the style to use for the equation’s paragraph, if it appears outside of an existing paragraph. Ignored if this tag appears inside a <par> tag. If used, must be defined in the DOCX Stub and be a paragraph style.
dpiopt : int
The DPI of the output image. Defaults to 96.
formatopt : Image Format
The output format, defaults to 'jpg'.
sizeopt : int or None
The text size, in points, used to render the equation. The default is to let LaTeX decide.

Content

Text Only. The text within the tag is parsed as a LaTeX equation.

<n>

Insert a line-break into the document. Line breaks only make sense within a paragraph, so this tag is ignored with a warning outside <par> tags.

Normally, this tag should appear inside a <run>. If not, the line break will be appended to the previous <run> in the current paragraph, or a new run will be created for it if it appears as the first tag.

Attributes

None

Content

No Content

<par>

Contains a paragraph of text. A paragraph is a collection of runs of differently formatted text, as well as some other elements. A paragraph can be styled with a paragraph-level style. Runs within a paragraph can have additional character-level styling that combines with or overrides the paragraph style.

Paragraphs should appear immediately under the document root to avoid warnings. Paragraphs that do not follow this (e.g., by being nested within each other), will be broken up unpredictably with a slew of warnings.

Paragraphs are automatically referenceable if they have a heading style. Non-heading paragraphs must explicitly declare their role to be par just like any non-par tag posing as a heading. References can be made using the <segment-ref> tag.

Attributes

styleopt : Paragraph Style
The name of the style to use for this paragraph. Must be defined in the DOCX Stub and be a paragraph style.
idopt : Reference ID
The ID of this paragraph, if it is being used as the target of a <segment-ref>. If an ID is not supplied, the segment can be referenced only through the title attribute of the <segment-ref>. IDs will be ignored for any non-heading paragraph without an explicit role.
listopt : { continued, bulleted , numbered }

If this paragraph is a list item, set this attribute to one of the allowed values. Options are case insensitive, and can be truncated: bullet and NUM are both examples of valid options as well.

This attribute is required to make a list item. If it is missing, the paragraph will not be bulleted/numbered, even if a list style is applied to it. continued will continue the style/numbering of the previous list item, no matter how many other items were inserted in between. The other options always start a new list with the default style determined by the list type.

list-levelopt : int
An integer between zero and infinity specifying the depth of the current list item. Numbers are generated automatically. If the paragraph immediately preceding this one is a list item, the depth is preserved by default (as is the style). Otherwise, the defalt depth for a new list is 1. Missing depth-levels get filled in automatically if the depth jumps by an increment of more than 1. Ignored if list is not set.

Content

Tags only. Any spurious text that is found will be placed into a run with the default style, along with a warning.

<run>

Contains a run of text, which is normally just characters, with optional keyword replacements. Runs are aggregated into <par> tags. A run can have a character-level style independent from all the other runs in the paragraph.

Attributes

styleopt : Character Style
The name of the style to use for this run of characters. Must be defined in the DOCX Stub and be a character style.

Content

Text and tags. Runs should always appear directly inside a <par> tag. Nested <run> will cause a fatal error. Runs outside a <par> tag will cause a warning and an implicit paragraph to be placed around them. Most other tags are allowed in a run, but may interrupt the run, to be resumed after with the same character style.

<section>

Introduces a new section into the document. Sections define the page parameters in the document. This tag begins a new section (rather than enclosing a section), which will continue until the next <section> tag or the end of the document.

Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed on the following page with the same styles.

Attributes

orientationopt : { 'Portrait' , 'Landscape' }
The page orientation of this section. Values are case-insensitive.

The supported attributes for this tag may be expanded in the future.

Content

No Content

<segment-ref>

Insert a reference to a <par> with a heading style, or another tag playing the role of a heading <par>.

The reference will look something like Section 1.2-1: Title, depending on the configured prefix, heading depth and separators.

Attributes

idopt : Python Identifier
The id of the corresponding <par>.
titleopt : String
The actual text of the corresponding <par>.

One of id and title must be present. If both are present, they must refer to the same target, or a fatal error will occur.

Content

No Content

<string>

Generates a dynamic string based on the selected handler. Strings are expected to appear within a <run>. Any other location will generate a warning.

This tag is similar to <kwd>, except that it creates content based on a dynamic runtime configuration rather than just the static mapping of keywords.

Attributes

id : Python Identifier
The name of the Data Configuration dictionary for the string. The name must appear in the IDC File file.
handler : str
The full name of the string handler class that will generate the content.

Content

No Content

<table>

Generates a table using the selected handler. Tables are constructed directly in the document, so any errors generated by the handler will result in a table stub along with the alt-text being placed in the document.

Tables are stand-alone entities. If this tag appears inside a <run> or <par> tag, a warning will be logged, and the paragraph and character styles will be resumed as necessary after the table.

Tables are referenceable through the <table-ref> tag.

Attributes

id : Python Identifier
The name of the Data Configuration dictionary for the table. The name must appear in the IDC File file. This is also the ID used by the <table-ref> tag to link back to this tag.
handler : str
The full name of the table handler class that will generate the content.
styleopt : dev/analysis/features/styles/table-style
The name of the style to use for this table. Must be defined in the DOCX Stub and be a table style.

Content

No Content

<table-ref>

Insert a reference to a <table>, or another tag playing the role of a <table>.

The reference will look something like Table 1.2-1, depending on the configured heading depth and separators.

Attributes

id : Python Identifier
The id of the corresponding <table>.

Content

No Content

<toc>

Insert a Table of Contents (TOC) into the document. Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed after the TOC with the same styles.

Attributes

minopt : int
The minimum heading level that the TOC supports. Defaults to 1.
maxopt : int
The maximum heading level that the TOC supports. Defaults to 3.
styleopt : Paragraph Style

The name of the style to use for the heading paragraph. Must be defined in the DOCX Stub and be a paragraph style.

The name of the style of the heading within the TOC.

Content

Text Only. The text will be aggregated without line breaks and used as the heading of the TOC. If omitted, defaults to nothing.

Extensions

Additional tags may be registered through the XML Tag API. New tags may not conflict with existing names, but otherwise have no real restrictions.

Glossary

The following terms are used frequently throughout this document:

error
A logged message that means that the current operation was aborted. The remainder of the document will still be processed.
fatal error
An error that is unrecoverable. In addition to being logged and aborting the current operation, the remainder of the document will not be processed.
Image Format
A short string indicating an image format for converstion tools. Common formats include 'jpg', 'png', 'bmp', etc. Most imprint features will default to either JPG or PNG format.
No Content
Nesting a tag or placing text in a tag that has this content description will cause a fatal error. The tag must effectively be of the form <tag/> or <tag></tag>. Whitespace is not considered to be content, so it may be present between an opening and closing tag.
referenceable
A tag is referenceable if it has a role attribute, of if it has reference functionality built into it. For more information on references, see the corresponding section in the tag API description: References.
Text Only
Nesting a tag in a tag that has this content description will cause a fatal error.