XML Template Specification¶
The XML template used by Imprint contains the static portions of the text of the final document, along with all the placeholders for dynamically generated content.
There is no DTD or XMLNS for the template, for two reasons. All validation is done internally by the Imprint core, in a manner that is as lenient as possible. Any errors that can be forgiven, will be, with a warning and a logged message. Additionally, it is possible to use the XML Tag API to extend the capabilities of the core processor without requiring modification of a hard-coded standard.
The XML format used by Imprint does not allow namespaces. Namespace tags will be ignored with a warning, even if they are registered through the XML Tag API.
Warning
All tag and attribute names are case-sensitive. All builtin tags and attributes are lowercase. Names must appear in the XML exactly as shown in the spec.
Contents
Root¶
The file root is always the <imprint-template>
tag. That being said, there
is a proposal to make it configurable: Configurable XML Root Tag.
Attributes¶
Normally, each tag has a set of required and optional attributes. Omitting a required attribute immediately triggers a fatal error. Omitting an optional attribute just sets the default value when processing. Any extra attributes that are neither required nor optional are logged but otherwise completely ignored. In the tag descriptions below, all attributes are mandatory, unless suffixed by opt for “optional”.
In addition to the normal attributes that any tag may have, there are attributes that are processed by the engine itself. Currently, there is one such attribute:
role
¶
Define the role of a tag and immediately make it referenceable. The role is the name of another tag that is referenceable by design. Among the builtin tags, <figure>, <table>, and sometimes <par> are referenceable by design. For more details on references, see the relevant section in the Tag API.
Normally, referenceable tags identify the target with an id
attribute. Defining a role
on a custom tag therefore implies that it must
also have an id
attribute in that case. Among the builtin tags,
<segment-ref> is an exception, in that it requires either an
id
or a title
. A tag with role="par"
therefore does not require an
id
attribute. The rules for custom tags are defined similarly: the check for
target identification attributes depends on what the role supports.
Tags¶
<break>
¶
Insert a page-break. If placed in the middle of a run, this will be a true page break. Otherwise, this will be a section break that starts a new page.
Attributes¶
None
Content¶
<expr>
¶
Evaluate a Python expression and create a new keyword. This tag can appear anywhere in the document. It temporarily suspends normal processing. Any text inside this tag will be evaluated as a Python expression, and the result will be assigned to the named keyword. All existing keywords, including those from prior <expr> tags, are available in the evauation namespace.
Keywords computed in this manner are treated the same as User-Defined Keywords and will be effective immediately as soon as the closing tag is reached, but not before. It is therefore common practive to put of all the expressions into the beginning of the XML Template.
The purpose of this tag is to abstract away common boiler-plate keywords that depend entirely on other keywords into the XML Template to avoid as much redundancy as possible.
System Keywords should never be set with this tag. System values may be used before the XML file is read, and may therefore not work as intended for this and other reasons.
Warning
This tag runs arbitrary Python code, with direct access to the keyword definitions. Avoid making assignments within the tag itself (even implicit ones) unless you really know what you are doing!
Warning
Any coding errors in the content of this tag will cause a fatal error.
Attributes¶
- name : Python Identifier
- The name of the new keyword to create.
- importsopt : List of module names
- A space-separated list of modules to import before evaluating the expression in the tag. Failed imports will be logged as an error.
<figure>
¶
Generates a figure using the selected handler, and insert it into the document. If Image Logging is enabled, a separate file with the image will be generated as well.
Figures are referenceable through the <figure-ref> tag.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the figure. The name must appear in the IDC File file. This is also the ID used by the <figure-ref> tag to link back to this tag.
- handler :
str
- The full name of the figure handler class that will generate the content.
- styleopt : Character Style
- The name of the style of the run containing the figure. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
- pstyleopt : Paragraph Style
- The name of the style of the paragraph containing the figure. Must be defined in the DOCX Stub and be a paragraph style.
- widthopt :
int
+{'in', 'px', 'cm', 'mm', 'pt', 'emu'}
- The width of the figure. Units are optional, and default to inches
(
'in'
). Suffixes can be separated from the number by optional whitespace. - heightopt :
int
+{'in', 'px', 'cm', 'mm', 'pt', 'emu'}
- The height of the figure. Units are optional, and default to inches
(
'in'
). Suffixes can be separated from the number by optional whitespace.
The attributes handler
, style
, pstyle
, width
and height
can
be overriden by keys with the same name in the Data Configuration
for the figure. If neither width
nor height
are specified, the figure
will be inserted as-is. If only one of them is specified, the figure will be
scaled proportionally.
Content¶
<figure-ref>
¶
Insert a reference to a <figure>, or another tag playing the role of a <figure>.
The reference will look something like Figure 1.2-1, depending on the configured heading depth and separators.
Attributes¶
- id : Python Identifier
- The
id
of the corresponding <figure>.
Content¶
<kwd>
¶
Perform a keyword replacement. Keywords are defined as in the IPC File. The entire tag is replaced with the value of the keyword.
Attributes¶
- name : Python Identifier
- The name of the keyword to replace.
- formatopt :
format_spec
- A format specification that can be used to convert the value into a string.
Content¶
<latex>
¶
Insert a LaTeX formula into the document as an image. This tag is only available if the appropriate dependencies are installed.
Equations interrupt the current run if their run style does not match the style of the current run.
Attributes¶
- styleopt : Character Style
- The name of the style of the run containing the equation. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
- pstyleopt : Paragraph Style
- The name of the style to use for the equation’s paragraph, if it appears outside of an existing paragraph. Ignored if this tag appears inside a <par> tag. If used, must be defined in the DOCX Stub and be a paragraph style.
- dpiopt :
int
- The DPI of the output image. Defaults to 96.
- formatopt : Image Format
- The output format, defaults to
'jpg'
. - sizeopt :
int
orNone
- The text size, in points, used to render the equation. The default is to let LaTeX decide.
<n>
¶
Insert a line-break into the document. Line breaks only make sense within a paragraph, so this tag is ignored with a warning outside <par> tags.
Normally, this tag should appear inside a <run>. If not, the line break will be appended to the previous <run> in the current paragraph, or a new run will be created for it if it appears as the first tag.
Attributes¶
None
Content¶
<par>
¶
Contains a paragraph of text. A paragraph is a collection of runs of differently formatted text, as well as some other elements. A paragraph can be styled with a paragraph-level style. Runs within a paragraph can have additional character-level styling that combines with or overrides the paragraph style.
Paragraphs should appear immediately under the document root to avoid warnings. Paragraphs that do not follow this (e.g., by being nested within each other), will be broken up unpredictably with a slew of warnings.
Paragraphs are automatically referenceable if they have a heading style.
Non-heading paragraphs must explicitly declare their
role to be par
just like any non-par
tag
posing as a heading. References can be made using the
<segment-ref> tag.
Attributes¶
- styleopt : Paragraph Style
- The name of the style to use for this paragraph. Must be defined in the DOCX Stub and be a paragraph style.
- idopt : Reference ID
- The ID of this paragraph, if it is being used as the target of a
<segment-ref>. If an ID is not supplied, the segment can
be referenced only through the
title
attribute of the <segment-ref>. IDs will be ignored for any non-heading paragraph without an explicit role. - listopt : {
continued
,bulleted
,numbered
} If this paragraph is a list item, set this attribute to one of the allowed values. Options are case insensitive, and can be truncated:
bullet
andNUM
are both examples of valid options as well.This attribute is required to make a list item. If it is missing, the paragraph will not be bulleted/numbered, even if a list style is applied to it.
continued
will continue the style/numbering of the previous list item, no matter how many other items were inserted in between. The other options always start a new list with the default style determined by the list type.- list-levelopt :
int
- An integer between zero and infinity specifying the depth of the current
list item. Numbers are generated automatically. If the paragraph
immediately preceding this one is a list item, the depth is preserved by
default (as is the style). Otherwise, the defalt depth for a new list is 1.
Missing depth-levels get filled in automatically if the depth jumps by an
increment of more than 1. Ignored if
list
is not set.
Content¶
Tags only. Any spurious text that is found will be placed into a run with the default style, along with a warning.
<run>
¶
Contains a run of text, which is normally just characters, with optional keyword replacements. Runs are aggregated into <par> tags. A run can have a character-level style independent from all the other runs in the paragraph.
Attributes¶
- styleopt : Character Style
- The name of the style to use for this run of characters. Must be defined in the DOCX Stub and be a character style.
Content¶
Text and tags. Runs should always appear directly inside a <par> tag. Nested <run> will cause a fatal error. Runs outside a <par> tag will cause a warning and an implicit paragraph to be placed around them. Most other tags are allowed in a run, but may interrupt the run, to be resumed after with the same character style.
<section>
¶
Introduces a new section into the document. Sections define the page parameters in the document. This tag begins a new section (rather than enclosing a section), which will continue until the next <section> tag or the end of the document.
Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed on the following page with the same styles.
Attributes¶
- orientationopt : {
'Portrait'
,'Landscape'
} - The page orientation of this section. Values are case-insensitive.
The supported attributes for this tag may be expanded in the future.
Content¶
<segment-ref>
¶
Insert a reference to a <par> with a heading style, or another tag playing the role of a heading <par>.
The reference will look something like Section 1.2-1: Title, depending on the configured prefix, heading depth and separators.
Attributes¶
- idopt : Python Identifier
- The
id
of the corresponding <par>. - titleopt : String
- The actual text of the corresponding <par>.
One of id
and title
must be present. If both are present, they must
refer to the same target, or a fatal error will occur.
Content¶
<skip>
¶
Marks a piece of text for further investigation, without any other side-effects.
The only purpose of this tag is to provide better logging of marked text, and to suppress warnings when it occurs.
Attributes¶
None
Content¶
Text and tags.
<string>
¶
Generates a dynamic string based on the selected handler. Strings are expected to appear within a <run>. Any other location will generate a warning.
This tag is similar to <kwd>, except that it creates content based on a dynamic runtime configuration rather than just the static mapping of keywords.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the string. The name must appear in the IDC File file.
- handler :
str
- The full name of the string handler class that will generate the content.
Content¶
<table>
¶
Generates a table using the selected handler. Tables are constructed directly in the document, so any errors generated by the handler will result in a table stub along with the alt-text being placed in the document.
Tables are stand-alone entities. If this tag appears inside a <run> or <par> tag, a warning will be logged, and the paragraph and character styles will be resumed as necessary after the table.
Tables are referenceable through the <table-ref> tag.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the table. The name must appear in the IDC File file. This is also the ID used by the <table-ref> tag to link back to this tag.
- handler :
str
- The full name of the table handler class that will generate the content.
- styleopt : dev/analysis/features/styles/table-style
- The name of the style to use for this table. Must be defined in the DOCX Stub and be a table style.
Content¶
<table-ref>
¶
Insert a reference to a <table>, or another tag playing the role of a <table>.
The reference will look something like Table 1.2-1, depending on the configured heading depth and separators.
Attributes¶
- id : Python Identifier
- The
id
of the corresponding <table>.
Content¶
<toc>
¶
Insert a Table of Contents (TOC) into the document. Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed after the TOC with the same styles.
Attributes¶
- minopt : int
- The minimum heading level that the TOC supports. Defaults to
1
. - maxopt : int
- The maximum heading level that the TOC supports. Defaults to
3
. - styleopt : Paragraph Style
The name of the style to use for the heading paragraph. Must be defined in the DOCX Stub and be a paragraph style.
The name of the style of the heading within the TOC.
Extensions¶
Additional tags may be registered through the XML Tag API. New tags may not conflict with existing names, but otherwise have no real restrictions.
Glossary¶
The following terms are used frequently throughout this document:
- error
- A logged message that means that the current operation was aborted. The remainder of the document will still be processed.
- fatal error
- An error that is unrecoverable. In addition to being logged and aborting the current operation, the remainder of the document will not be processed.
- Image Format
- A short string indicating an image format for converstion tools. Common
formats include
'jpg'
,'png'
,'bmp'
, etc. Most imprint features will default to either JPG or PNG format. - No Content
- Nesting a tag or placing text in a tag that has this content description
will cause a fatal error. The tag must effectively be of the form
<tag/>
or<tag></tag>
. Whitespace is not considered to be content, so it may be present between an opening and closing tag. - referenceable
- A tag is referenceable if it has a role attribute, of if it has reference functionality built into it. For more information on references, see the corresponding section in the tag API description: References.
- Text Only
- Nesting a tag in a tag that has this content description will cause a fatal error.