Welcome to Imprint’s documentation!¶
Introduction to Imprint¶
Welcome to Imprint!
What is Imprint?¶
Imprint is a framework for automating the generation of similarly-structured documents in MS Office Open XML format (docx). Its goal is to provide a robust, repeatable, and reliable system for generating complex content. It eliminates the issues associated with manually generating repeated content.
An example usage is hardware testing reports. The report for each component has an identical structure with all the other reports. However, the numbers, charts and tables have to be obtained from data sets specific to that hardware component.
How Does It Work?¶
Imprint is sturctured in a set of components arranged in layers between the user and the final document that it creates.
Configuration Layer¶
The files in the configuration layer are provided for each report. They contain a distillation of all the differences between reports of a given type. IPC files configure the Engine Layer (the essence of Imprint itself). IDC files files direct the inputs and behavior of the Plugins Layer.
Templates Layer¶
Templates are static configuration files. The structure of the document, along with all static text and placeholders for generated content is laid out in the XML Template file. The styles referenced in the XML are defined in an empty Word document, the DOCX Stub.
The IIF Files files serve as a bridge between the Configuration Layer and the Templates Layer. They follow a similar keyword definition format to the configuration, but provide static content that is intended to be shared between reports. Include files are used to clean up redundancy in the Configuration Layer by aggregating static information[1].
Plugins Layer¶
Content-generation plugins are written to handle the data-specific content of complex tags dynamically. A well written plugin can be used across multiple document types in an organization. Plugins can generate images, tables and text with values that depend in a dynamic configuration. The type of content a plugin generates, and the interface it follows, is determined by the tag that it supports.
Concretely, plugins are Python classes (or functions) that implement the exact interface laid out by their parent tag (see Plugin API). An introduction to writing plugins is provided in the Writing Plugins tutorial. Live examples can be found throughout any Imprint deployment.
The input data and plugin behavior is defined by an IDC File in the Configuration Layer, so the same plugin can be used to generate all sorts of content based on different configurations. For example, hardware reports will generally contain tables of statistics and some sort of chart or histogram to accompany them. Having both of those plugins share data loading and preprocessing code (and usually their data configuration dictionary as well) guaranteeds consistent results.
Engine Layer¶
The engine is the core of Imprint that runs the entire system. It is responsible for setting up the runtime environments, ingesting all the configuration and directing the operation of all the plugins. The engine is executed through entry points in the Programs.
Output Layer¶
The final layer is the output. In addition to the main document, Imprint provides an enormous amount of traceability with its Logging output. The log file itself can be set up through IPC File. Both the name and the logging level are configurable. In addition to the log, all images that are generated for insertion into the document can be stored in separate files as well. This option is also configurable through the IPC File.
Historical Note¶
How did imprint come into being?
Around the years 2016-2018, the analysts at the Detector Characterization Lab (DCL) at NASA Goddard Space Flight Center (GSFC) working on Euclid project were creating reports of all the flight-grade SCAs[2] and SCSs[4]. These reports were on the order of around 50 pages each, contained figures and tables describing the analysis of every aspect of the testing being done on each component, and written individually by hand. Usually, the analysts would of course start with an existing report as a template, and modify the pictures, numbers and tables based on their results.
This presented a number of issues, all of which could be solved with automation. The size of the report, and the amount of data each one contained made replacing items both time consuming and error prone. This was exacerbated by the fact that the same data was used to generate multiple sets of figures, tables and text elements within a given document. And of course the number of reports being generated made it difficult to keep track of versions and templates. For one thing, it was easy to forget to update one of the figures or tables but not the other. For another, any typos that were found and corrected in the static text of the document would not always find their back to all the existing versions, and therefore possibly not into future ones either.
The reports were being used for two purposes. The long-term purpose would be to archive the detector data, so that all the test data would be available for in-flight debugging teams. In the short-term, the reports were used to communicate test results to the the customer, The European Space Agency (ESA). With this set of goals, having minor but persistent errors in the documents was deemed unacceptable, as was the amount of time being spent by qualified analysts in editing Microsoft Word documents.
A program called RepGen was created to solve most of the issues encountered with the generation of such reports. Its primary requirements were to be robust, accurate, reliable, repeatable and traceable. It placed all of the static text into an XML template making it trivial to fix typos across all reports and revisions at once. The configuration files for a particular report were structured to eliminate redundancy of information, improving traceability. Shared include files, along with a sensible structure of data was used to turn the creation of new configurations into a two-step copy-and-paste job. The content generation code was technically left up to the group using RepGen. However, code reuse was encouraged here as well, and certainly built into the basic handlers, so that consistent results could be expected from a single dataset used in multiple types of content. Plugins allowed similar types of information about different data sets to be rendered in a consistent format in multiple places in a report. All operations were logged to any level desired, including the generation of all content, so errors and inconsistencies could be found quickly and easily.
RepGen went on to become Electronic New Technology Report (eNTR) #1518805444 at NASA. Imprint is a philosophical child of RepGen. It does not share any of the old code, but it does provide a significantly improved version of the same sort of flexibility as its inspiration.
Where do I go From Here?¶
If you are a new user of Imprint, the recommended place to start is the Tutorials section. The Getting Started page especially will help you get a sense of how to set up an Imprint project for the first time.
The other main area of the documentation, Reference is for more advanced users. It contains the formal definitions and specifications of the interfaces used by the system.
If you are unsure where to go next, the Main Page is always a good place to start browsing through all of the available topics.
Footnotes
[1] | The <expr> tag provides a more limited way to do this as well. |
[2] | (1, 2) The Sensor Chip Array (SCA) is basically the detector chip. |
[3] | Sensor Chip Electronics (SCE) is the ASIC used to operate the detector. |
[4] | The Sensor Chip System (SCS) is the SCA[2] combined with the SCE[3]. |
Installation Guide¶
This document explains how to install Imprint.
Installing the Package¶
PyPI¶
Imprint is available via pypi, so the recommended way to install it is
pip install imprint[all]
The extra [all]
installs most of the Dependencies necessary to
generate simple images and tables. It can be omitted for a bare-bones
install.
Source¶
Imprint uses setuptools, so you can install it from source as well. If you have a copy of the source distribution, run
python setup.py install
from the project root directory, with the appropriate privileges. A source distribution can be found on PyPI as well as directly on GitHub.
You can do the same thing with pip if you prefer. Any of the following should work, depending on how you obtained your distribution
pip install git+<URL>/imprint.git@master[all] # For a remote git repository
pip install imprint.zip[all] # For an archived file
pip install imprint[all] # For an unpacked folder or repo
See the page about Dependencies for a complete description of additional software that may need to be installed. Using setup.py or pip should take care of all the Python dependencies.
Demos¶
Imprint is packaged with a set of demo projects intended primarily for the Tutorials. The demos are not normally installed as part of Imprint, Instead, they are to be accessed through the source repository or the documentation Documentation, once that is built. See Demos for a complete list.
Tests¶
Imprint does not currently have any formal unit tests available. However,
running through all of the demos serves as a non-automated set of tests, since
they exercise nearly every part of Imprint. Eventually, pytest-compatible tests
will be added in the tests
package.
Documentation¶
If you intend to build the documentation, you must have Sphinx installed, and optionally the ReadTheDocs Theme extension for optimal viewing. See the dependencies spec for more details.
The documentation can be built from the complete source distribution by using the specially defined command:
python setup.py build_sphinx
Alternatively (perhaps preferably), it can be built using the provided Makefile:
cd doc
make html
Both options work on Windows and Unix-like systems that have make installed. The Windows version does not require make. On Linux you can also do
make -C doc html
Building the documentation will also make a copy of the Demos.
The documentation is not present in the PyPI source distributions, only directly from GitHub.
Tutorials¶
The pages in the tutorials section show step-by-step instructions on how to get Imprint up and running. They cover virtually every aspect of the program from the point of view of various types of users. For a quick reference, consult the Reference documents.
For the basic user, there is the Getting Started page. It is the recommended next step for all first-time users. Once you have mastered that, Basic Tutorial will show you a more complete picture.
For the more advanced customization techniques, start with Additional Topics, Part 1, Additional Topics, Part 2, and Styles and Formatting. More advanced subjects, with coding involved, are explored in the Writing Plugins and Writing Custom Tags tutorials.
Developers can probably jump right into the Reference section with the Plugin API and the Tag API.
If you are unsure where to go next, the Main Page is always a good place to start browsing through all of the available topics.
Getting Started¶
If you are a first time user, you have come to the right place. This tutorial is the “Hello World!” example for Imprint. It demonstrates the most basic setup, and hopefully explains some of the possible uses for Imprint in doing so. Most of the material shown here is reiterated with more detail in the Basic Tutorial.
Creating a New Project¶
The easiest way to set up a new project is usually to copy an existing one. If that is not an option, create a new folder for your new project. All of the Paths in a project will be resolved relative to that folder, so it will be self-contained.
If you would like to simulate copying an existing project, download and extract
the HelloWorld example
. If you would like
to start a new project, create a folder named HelloWorld
somewhere, and
follow along with the rest of this tutorial. Unless otherwise stated, all the
files described below exist under the root HelloWorld
folder.
Making a Template¶
First let’s begin by laying out the structure and content of our document in an XML Template. Our basic template for this example will look like this:
HelloWorld.xml
: The
document content and structure template.¶1 2 3 4 5 6 7 | <imprint-template>
<par style="Normal">
<run style="Default Paragraph Font">
Hello <kwd name="What"/>!
</run>
</par>
</imprint-template>
|
Let us inspect the contents of this file tag-by-tag to understand what is going on.
The outermost <imprint-template> tag is necessary to make the XML into a Imprint template.
Document text is arranged into paragraphs, which are surrounded by <par> tags. Our example has only one such tag, and therefore only one paragraph. The paragraph has a Normal style. Paragraphs can contain different <run>s of character-level formatting, but it is fairly standard to have a single run with the Default Paragraph Font style. This style means that all the paragraph-level styling information is left untouched.
Finally, the innermost portion is the text of the paragraph. Our example contains two elements: the literal word Hello, and a <kwd> tag. This tag tells the Engine Layer to perform a keyword replacement. The name of the keyword is What. We will see how to define the value of What next. This value will be placed literally into the document, replacing the <kwd> tag. You can begin to imagine how this could be useful for generating multiple documents from the same template.
Note
Keep in mind that this template is very simple and easy to write. A normal Imprint template is usually quite large, and should be created only once for a large number of documents. Normally, the template will be stored outside the setup directory, where it can be accessed by many configurations.
Creating the Configuration¶
The second file we will create for this example is the program configuration. This file tells the Engine Layer what to do, in addition to setting up the User-Defined Keywords, like What, required by the template. Here is our configuration file:
HelloWorld.ipc
The
document configuration script.¶1 2 3 4 5 | input_xml = 'HelloWorld.xml'
output_docx = 'HelloWorld.docx'
overwrite_output = 'silent'
What = 'World'
|
This is a simple Python file that defines some Keywords.
Keywords starting with lowercase letters are System Keywords. input_xml and output_docx are both mandatory: Imprint will raise an error and abort immediately without them. The former references the template we just created, while the latter gives the name of the output document.
overwrite_output is an optional system keyword. It tells
Imprint what to do if the output already exists. Setting it to 'silent'
as
we did here tells the engine to overwrite an existing output file without
further ado. You can omit this keyword entirely, but the default is to raise
an error if output_docx already exists.
Keywords starting with upppercase letters are User-Defined Keywords. Our example only has one user-defined keyword: What. The value of this keyword is used to replace the <kwd> tag in our XML template.
The order of keywords does not matter. You can shuffle them however you want, mix system and user defined keywords, and generally do whatever seems best. However, since this is Python code, keywords can reference each other. In that case, any keywords on the right hand side of the assignment must be defined before they are referenced for the first time.
All of the paths in configuration files are resolved relative to the folder containing the IPC File. This means that you can copy the entire folder, make some modifications to the configuration, and run it to get an entirely different and independent document.
Running Imprint¶
You now have a working setup. imprint is a command-line tool. You can
run it by passing in a single argument: the name of the
configuration file. Assuming that your current
working directory is set to HelloWorld
, you can generate your first
document by doing
imprint HelloWorld.ipc
That’s it. you should now have a file called HelloWorld.docx
. If you open
it in MS Word, you will see
The output will be the same (and in the same place) regardless of what directory you run ipc from.
In this simple example, we did not show the use of plugins, logging, or any of the other advanced features of Imprint. Look into the other Tutorials, starting with the Basic Tutorial for additional information.
Basic Tutorial¶
This tutorial offers a more realistic example of how to set up a simple project from scratch than the Getting Started page. For this tutorial, we will create a somewhat contrived, but fairly polished document describing a made-up series of candle flame height measurements.
Topics Covered:
Project Setup¶
The files for this tutorial are available in the
CandleFlame example
. You may chose to
download and extract the provided archive, or start with an empty folder named
CandleFlame
and populate it as the tutorial progresses.
For this tutorial, we will emphasize the differences between the
Configuration Layer and the
Templates Layer. Our templates (both
XML Template and DOCX Stub), are placed in a
separate folder named CandleFlame/templates
. Normally, this folder would be
outside the document configuration entirely, so that it can be shared by
multiple documents. The IIF Files will be placed here as well, to
emphasize their shared role.
Additional Topics, Part 1¶
This tutorial covers some of the topics not covered in the Basic Tutorial. The focus here is on the flexibility offered by the configuration files, especially the XML Template and IPC File. A basic understanding of the topics covered in the Basic Tutorial is assumed.
For a tutorial covering topics more targeted towards formatting through the DOCX Stub and plugin usage, see Additional Topics, Part 2.
Topics Covered:
Project¶
The project for this tutorial is Games
. The
discussion will only focus on the relevant portions of the relevant files, so
readers are encouraged to download and extract the entire project before
delving into the tutorial.
The document created in the project will contain a couple of trivial lists of
board games just for illustration. The baseline version can be created using
Games.ipc
:
imprint Games.ipc
The output Games.docx
will look something like this:
Overriding Keywords¶
The includes system keyword is not only for IIF Files. It can also be used to modify portions of the IPC File in a traceable and repeatable manner.
Using the fact that included files can not override defined keywords, we can define an IPC File snippet that just overrides the keywords that we want, and includes everything else from the original file:
Games0.ipc
: Overriding the
caption_counter_depth.¶1 2 3 | caption_counter_depth = 0
includes = ['Games.ipc']
|
The example show here is used to modify the caption_counter_depth setting. The same technique can be used equally well to modify other System Keywords as well as User-Defined Keywords. Such modification is useful for testing and to create documents that are closely related to each other in terms of most of their configuration.
Games0.ipc
and its siblings
Games2.ipc
,
Games3.ipc
, and
GamesNone.ipc
are revisited
in the section on Setting the Caption Counter Depth.
Making Lists¶
Lists are created by setting the list
attribute of the
<par> tags. List items are just regular paragraphs with some
extra styling added on for bullets or numbering. A sample
XML Template with list items looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | <imprint-template>
<par style="Title"><run>Sample of Board Games</run></par>
<toc>Contents</toc>
<par style="Heading 1">
<run>
Gridded Games
</run>
</par>
<par>
<run>
The following images show the boards used by different types of gridded
games. <figure-ref id="checkers"/> and <figure-ref id="chess"/> show
checkers and chess, respectively. These are some of the most common
games played on a static pre-made board. <figure-ref id="tic_tac_toe"/>
and <figure-ref id="battleship"/> show tic-tac-toe and battleship
boards. These are no less ubiquitous, but are generally played on
hand-drawn boards.
</run>
</par>
<par style="Heading 2">
<run>
Board Games
</run>
</par>
<par><run>This section lists games played on pre-made boards.</run></par>
<par list="numbered"><run>Checkers</run></par>
<figure id="checkers" handler="imprint.handlers.figure.ImageFile" />
<par list="continued"><run>Chess</run></par>
<figure id="chess" handler="imprint.handlers.figure.ImageFile" />
<par style="Heading 2">
<run>
Paper Games
</run>
</par>
<par>
<run>
This section lists games played on hand-drawn paper "boards".
While pre-made board versions of these games exist, they are
traditionally played on paper. For strictly board-type games, see
<segment-ref title="Board Games"/>.
</run>
</par>
<par list="num"><run style="Default Paragraph Font">Tic Tac Toe</run></par>
<table id="tic_tac_toe" handler="imprint.handlers.table.CSVFile" role="figure" />
<par list="cont" style="List Number"><run>Battleship</run></par>
<figure id="battleship" handler="imprint.handlers.figure.ImageFile" />
</imprint-template>
|
To start a new list, set the list
attribute to either numbered
or
bulleted
. Lines 26 and 43 in the example show how this is done.
The full word (which is case-insensitive by the way) can be spelled out, or
any prefix of it can be used, as in line 43.
To append elements to a list, set list
to continued
, as in lines
28 and 45. The list will be continued regardless of how many additional
paragraphs or other elements are placed between the list items. For example,
the figures on line 27, 29 and 46 and table on line 44 do not
break up the numbering scheme of the two lists we created:
Additional information is available in the List Styling tutorial.
Adding References¶
The text ,
and
are dynamically generated reference names, which
are automatically derived from the position of the figure or heading in the
document outline.
The figure references are created by the <figure-ref> tags on lines 12, 14 and 15:
Games.xml
, lines
10-18, emphasizing the <figure-ref>s.¶10 11 12 13 14 15 16 17 18 | <run>
The following images show the boards used by different types of gridded
games. <figure-ref id="checkers"/> and <figure-ref id="chess"/> show
checkers and chess, respectively. These are some of the most common
games played on a static pre-made board. <figure-ref id="tic_tac_toe"/>
and <figure-ref id="battleship"/> show tic-tac-toe and battleship
boards. These are no less ubiquitous, but are generally played on
hand-drawn boards.
</run>
|
Each <figure-ref> identifies the figure it refers to by its
id
attribute. This is how most References are identified.
The text reference on line 40 is greated by a
<segment-ref> tag, which points to paragraphs. Since
paragraphs do not normally have an id
attribute, they can be referenced by
title
instead:
Games.xml
, lines
39-41, emphasizing the <segment-ref>.¶39 40 41 | traditionally played on paper. For strictly board-type games, see
<segment-ref title="Board Games"/>.
</run>
|
The title
of a <segment-ref> is the
full text of the heading or other paragraph that is being pointed to, with all
the extra spaces and line-breaks removed:
Games.xml
, lines
20-24, emphasizing the heading title that the
<segment-ref> refers to.¶20 21 22 23 24 | <par style="Heading 2">
<run>
Board Games
</run>
</par>
|
Among builtin tags, <figure> and <par> can be referenced by <figure-ref> and <segment-ref>, respectively. We have not seen <table-ref> tags in the tutorial so far, which reference <table>s. <table-ref> works just like <figure-ref>, but with “Table” in the reference name instead of “Figure”.
Setting the Caption Counter Depth¶
The formatting of the figure number for <figure-ref> (and table number for <table-ref>) is set by the caption_counter_depth system keyword.
The default caption_counter_depth is 1, meaning that only the top-level heading is considered when counting and naming figures. If we were to change caption_counter_depth to say 2:
Games2.ipc
: Overriding the
caption_counter_depth, setting the depth to
2.¶1 2 3 | caption_counter_depth = 2
includes = ['Games.ipc']
|
We would see two elements in the heading level, and the figure counter would
restart with every second-level heading instead of just the top level heading.
The references that previously looked like and
now look like
and
.
This snippet provides additional illustration for Adding Include Files. We can use a similar technique to remove the heading information from the references entirely, by setting caption_counter_depth to zero:
Games0.ipc
: Overriding the
caption_counter_depth, setting the depth to
0.¶1 2 3 | caption_counter_depth = 0
includes = ['Games.ipc']
|
These references show the figure counter for the whole document:
and
.
The cases shown here are well-behaved. In the case where caption_counter_depth is 2, all the references live in a heading at least two deep, when it is zero, there can’t be any problems at all. But if caption_counter_depth is set to a number that is greater than the outline depth of the heading containing the reference, the missing levels are ignored:
Games3.ipc
: Overriding the
caption_counter_depth, setting the depth to
3.¶1 2 3 | caption_counter_depth = 3
includes = ['Games.ipc']
|
In this case, the references will look identical to the ones with caption_counter_depth set to 2.
To turn off the the truncation of captions entirely, and just count references
within each nested level of subheading independently, set
caption_counter_depth to None
:
GamesNone.ipc
: Overriding
the caption_counter_depth, unsetting the
depth entirely.¶1 2 3 | caption_counter_depth = None
includes = ['Games.ipc']
|
The result will be identical with the case where caption_counter_depth is 2 for this particular example as well, but in general, the heading portion of the reference will not be constrained (similar to section headings). The counter will restart for any heading that is encountered in the document.
There are plenty of other pathalogical cases out there in terms of missing
heading levels. The reader is assured that Imprint handles all of them
consistently, and is left with the exercise of verifying that assertion for
themselves. For a starting point, see the obscure
PathologicalCases
project.
Using Roles¶
Roles allow tags to impersonate each other as reference targets. The most common usage is to turn tables or equations into figures that can be referenced as “Figure 1.3-1”, rather than being treated as a table or equation.
Our sample template creates such a <table> to describe Tic-Tac-Toe:
43 44 | <par list="num"><run style="Default Paragraph Font">Tic Tac Toe</run></par>
<table id="tic_tac_toe" handler="imprint.handlers.table.CSVFile" role="figure" />
|
The reference for this table can only be performed through a <figure-ref> tag, rather than the usual <table-ref>:
Games.xml
, lines
13-15, emphasizing the unusual <figure-ref>
tag.¶13 14 15 | checkers and chess, respectively. These are some of the most common
games played on a static pre-made board. <figure-ref id="tic_tac_toe"/>
and <figure-ref id="battleship"/> show tic-tac-toe and battleship
|
Any tag, whether it is normally referenceable or not, can impersonate a role.
For example, all it takes for a <latex> equation to become
a figure is the addition of an attribute: role="figure"
. That being said,
not all roles are suitable for every tag. For example, the
PathologicalCases
project has an
example of a <table> that plays the role of a heading with
role="par"
. This introduces the problem that <table>
should not contain text, and so normally can not be referenced by
<segment-ref>’s title attribute.
Additional Topics, Part 2¶
This tutorial covers some of the topics not covered in the Basic Tutorial. The focus here is on how to set up proper styling through DOCX Stub and how to utilise plugins to their fullest potential. A basic understanding of the topics covered in the Basic Tutorial is assumed. A passing understanding of the concepts in Writing Plugins may be required for an in-depth understanding.
For a tutorial covering topics more targeted towards content and configuration through XML Template and IPC File, see Additional Topics, Part 1.
Topics Covered:
Project¶
The project for this tutorial is Invoice
. The
discussion will only focus on the relevant portions of the relevant files, so
readers are encouraged to download and extract the entire project before
delving into the tutorial.
There will also be sections that demonstrate how to work through the MS Word user interface, as well as some XML formatting in a text editor.
The document created in the project will contain a made up customer invoice, along with a letter to the customer. It will look something like this:
The project uses two custom plugins and one built-in one to process the data.
The plugins are implemented in
invoice.py
and registered in
Company.iif
. If you have not
done so already, read through the Using Your Plugin portion of the
Writing Plugins tutorial.
Image Logging¶
Images that are generated for the document can be “logged” by copying them into the log directory, or if conventional logging is disabled, into to the document output directory. Image logging also applies to strings, LaTeX equations, and sometimes tables (all the common handlers implement it). For common handlers that just insert images or table data as-is into a document, this is not much of an advantage. However, when a figure handler generates a complex image or chart from scratch, it is often useful to have it output to disc as well as using it from memory.
Image logging is controlled by the log_images system keyword in the IPC File:
Invoice.ipc
, lines 18-19,
showing the log_images setting.¶18 19 | log_file = True
log_images = True
|
Image logging is not enabled by default. With logging turned on, you will see the following additional files in your output directory:
Invoice_authorized_signature.png
This is the only actual image that is logged. It is a copy of the authorization signature that is inserted by the <figure> tag in the XML Template:
Invoice.xml
, lines 57-65, emphasizing where the signature is generated.¶57 58 59 60 61 62 63 64 65
<par style="Normal"> <run>Kindest Regards,</run> </par> <par style="Figure Container"> <figure id="authorized_signature" handler="imprint.handlers.figure.ImageFile"/> </par> <par style="Normal"> <run><kwd name="AuthorizedSigner"/></run> </par>
Invoice_damage_assessment.txt
This is the output of the <string> tag in the XML Template. Strings are dumped into a text file for inspection, since they are generated content, like images.
Invoice.xml
, lines 25-29, emphasizing where the custom string is inserted.¶25 26 27 28 29
<par style="Normal"> <run> <string id="damage_assessment" handler="invoice.damage_assessment"/> </run> </par>
Invoice_financial_data.csv
This is a copy of the financial data that is used to do the damage assessment and to generate the actual invoice. It is generated in response to the <table> tag in the XML Template:
Invoice.xml
, lines 84-92, emphasizing where the invoice table is generated.¶84 85 86 87 88 89 90 91 92
<par style="Normal"> <run>Transaction Date: </run> <run style="Strong"><kwd name="InvoiceDate" format="%Y-%b-%d"/></run> </par> <table handler="invoice.invoice_table" id="financial_data" style="Plain Table 1" /> <par style="Post Table"> <run>Payment in full is due on </run> <run style="Strong"><kwd name="DueDate" format="%Y-%b-%d"/></run> </par>
Tables are not required to dump their data unless it really makes sense to do so. Due to the relatively flexible structure of tables in Word documents, the plugin itself is responsible for how the data is to be written. Other plugins rely on the tag to do their logging for them.
Todo
Some of the last paragraph above probably belongs in the plugin tutorial, not here.
Line and Page Breaks¶
The built-in tags support two types of breaks: line and page breaks. Both are
to be found in the Invoice
sample project.
Line Breaks¶
Line breaks are placed directly in a run of text using the <n> tag:
Invoice.xml
,
lines 39-44, showing how line breaks are inserted.¶39 40 41 42 43 44 | <par style="List Paragraph">
<run><kwd name="AddressAttn"/><n/>
<kwd name="Address1"/><n/>
<kwd name="Address2"/><n/>
<kwd name="Address3"/></run>
</par>
|
The result is a single run of text, but broken over multiple lines in a controlled manner:
Line breaks can only appear in a run of text. If they appear anywhere within a <par> tag, an attempt will be made to find or even create a suitable run for the line break. However, outside a paragraph, <n> gets ignored completely, with a warning.
Page Breaks¶
Unlike line breaks, page breaks can appear just about anywhere. This includes <run> and <par> tags, as well as the document root.
Page breaks are inserted with a <break> tag:
Invoice.xml
,
lines 63-72, showing how a page break can be used.¶63 64 65 66 67 68 69 70 71 72 | <par style="Normal">
<run><kwd name="AuthorizedSigner"/></run>
</par>
<break/>
<!-- Second (Invoice) Page -->
<par style="Title">
<run>Customer Invoice</run>
</par>
|
The page break in this example separates the signature in the preface letter from the page containing the actual customer invoice. Usually, page breaks appear between paragraphs, as in this example, but that is not a requirement.
When a page break cuts a run or paragraph in two, a new paragraph and/or run with the same style is really created on the next page.
Styles and Formatting¶
This section demonstrates how to apply styles and formatting to the document at every level.
Topics Covered:
Applying Styles¶
Paragraphs¶
Headings¶
Lists¶
The Making Lists tutorial explains how to create lists. For simple lists, a default paragraph style is automatically selected, based on whether the list is numbered or bulleted. Anything more complicated will require explicitly setting a style.
A good example of when to use explicit list styles is when a list item contains multiple paragraphs. Consider the following snippet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <par list="num">
<run>
This is an example of a list item containing multiple
paragraphs.
</run>
</par>
<par style="List Continue">
<run>The second paragraph is part of the first list item.</run>
</par>
<par list="cont">
<run>
The third paragraph continues the numbering of the list where
we left off.
</run>
</par>
|
The result is a multi-paragraph list item for item #1. If we had not explicitly added the same style to the middle paragraph, its indentation would not have been correct for a list item:
Todo
Add a big blurb about the fact that this only works because the default list styles are sensibly set in the global defaults file. If not, the most default default list style is actually not very useful (indents by 4 tabs).
Writing Plugins¶
Imprint is all about customization, and the Plugins Layer is the crux of that customization. But what exactly is a plugin, and how do you write one? This tutorial aims to provide a step-by-step, hands-on, introduction the types of plugins that are supported, and how to write them.
A plugin is a callable
that creates the dynamic
content that makes gives Imprint its power. Each plugin fulfills a particular
interface, defined by the XML tag that it is bound to. There are three main
types of content that can be generated by default: Figures,
Tables and Strings. Custom tags that support
plugins can be created as well. This advanced topic is covered in the
content_tutorial.
While different types of plugins are different from each other, there are a few common features they all share. The first two arguments to each of the built-in Handlers are the dictionary of Keywords and the Data Configuration. The remaining arguments depend on the specific tag. Custom tags are not strictly required, but highly encouraged, to follow this convention.
Topics Covered:
Tables¶
Tables are generally the most complex type of plugin for the builtin tags, since they have to modify the document in-place as they generate their content. This leads to interesting artifacts, like partially generated tables in case of an error. Broken Figures and Strings are entirely replaced by alt-text, but tables will generally be generated up to the point where the error occurred.
It also means that the <table> tag does not handle data logging, instead leaving the task up to the discretion of individual plugins. This is very different from the simpler plugin tags like <figure> and <string>, which handle the data logging in a uniform manner, without delegation to a plugin.
Using Your Plugin¶
You made a plugin. Now what? How do you use it in the template you just created?
This is a two-step process. First you have to let Imprint and Python know where your plugin lives. Second, you have to refer to the plugin in your template somehow. Both steps are covered in detail in the next sections:
Registering Your Plugin¶
To register a plugin, you must place it in the Python Path. This is normally done with something like
import sys
sys.path.insert(0, 'path/to/plugin/module')
It is often convenient to put such a registration into a dedicated IIF File.
Todo
This has been totally changed by the ??? keyword.
Todo
Add an example
Note
Keep the import as import sys
rather than from sys import path
,
since the latter will add a keyword to your namespace,
while a module will be ignored after loading.
Referencing Your Plugin¶
Once a plugin is in your Python Path, you can reference it as you would any
other module in your tag’s handler
attribute.
Todo
Add an example
Writing Custom Tags¶
Topics Covered:
Writing a TagDescriptor
¶
Todo
Add the following:
> This tutorial covers the creation of a basic XML tag. It does not delve > into the subject of tags with plugins. This advanced > topic is covered in the content_tutorial.
Making Your Tag Built-In¶
If you end up writing a tag that you believe is generic and useful enough to be built-in, feel free to submit a pull request or patch to the author. Be sure to include all, or at least most, of the following items:
- A properly documented implementation of your tag in the
imprint.core.tags
module.- A proper entry in XML Template Specification.
- At least a brief mention of your tag in at least one of the tutorials.
- Proper tests, once that becomes a thing.
Demos¶
The tutorials in this documentation rely on a number of small demo projects to illustrate the features of Imprint. The projects are available for download as zip files so that the reader can follow along in the tutorial and experiment on their own. The following is a list of the available demo projects:
HelloWorld
for Getting StartedCandleFlame
for Basic TutorialGames
for Additional Topics, Part 1Invoice
for Additional Topics, Part 2Snippets
for Styles and Formatting- … for Writing Plugins
- … for Writing Custom Tags
PathologicalCases
for testing purposes
Reference¶
This part of the documentation is the specifcation of the various components and interfaces of Imprint. For examples and clearer usage instructions, consult one of the pages in the Tutorials section.
Imprint Configuration Files¶
This page contains a summary of the different files that users must provide to have imprint operate properly. Most of the files have their own reference pages and tutorial sections.
The different types of files are normally referred to by their extension. However, since internally files are always referenced to by their full name, none of the extensions listed here are actually mandatory. They are a default choice made for clarity and aesthetics, not functionality.
Contents
IPC File¶
The Imprint Program Configuration (IPC) file is the main script for a given output of imprint. It contains a set of Keywords mapped to values. Some of the keywords reference the other configuration files and configure the Engine Layer; others provide the user-defined data for content generation. The former are referred to as System Keywords, while the latter are User-Defined Keywords.
The file is written using Python syntax. Keywords are normal Python names. All
the restrictions that apply to Python variable names apply to keyword names.
Traditionally, System Keywords which direct the operation of the
Engine Layer start with lowercase letters, while
User-Defined Keywords containing per-document data start with uppercase letters.
Any keyword starting with a dunder (double underscore / __
) is for internal
use by the configuration file, and will not be exposed to the core at all.
Modules imported into the configuration will not be exposed either.
Paths¶
Relative paths are resolved from the directory containing the IPC File, not the current directory. This makes it easy to copy entire configurations to different locations, and have them work out of the box. It also allows a user to generate multiple documents correctly without changing directories, and generally removes any dependence on the current directory.
In particular, this applies to the following system keywords, which are expected to contain a path or paths:
IDC File¶
A Imprint Data Configuration (IDC) file complements the core configuration of the IPC File by supplying the data configuration mappings for the Plugins Layer. The data configuration is referenced by the data_config keyword.
Like the IPC File, the IDC File uses Python
syntax. It follows a similar loading convention of removing any names starting
with a dunder (double underscore / __
) from the loaded namespace. Unlike
IPC File, recursive includes are not allowed.
Each name in the global namespace of the IDC File corresponds to a plugin configuration. Normally, all the visible names in the file are Python dictionaries, but other mapping types are allowed.
The builtin <figure>, <table> and <string> tags support plugins. The plugins are structured so that unnecessary keys are silently ignored, making it possile to share data configuration across multiple tags. For example, a figure and a table generated from the same data set can share a data configuration, and therefore avoid the redundancy of repeated data source specs.
Configuration Names¶
Plugin tags in the XML Template are mapped to their configuration
objects by a special attribute, usually id
. The name of the attribute is
set for each plugin’s descriptor.
A missing configuration aborts the generation of its particular content, but does not necessarily constitute a fatal error.
IIF Files¶
Imprint Include Files (IIF) have the exact same format as the main IPC File. Their purpose is to share content between multiple document configurations, using the includes keyword.
Include files are intended to supplement the main configuration file. The main configuration automatically overrides any duplicate keys that are found in the includes.
Includes may be done recursively. Since the engine does not check for infinite loops, use this feature carefully.
XML Template¶
The XML template defines the structure and content of the document. A full specification of the XML structure is given in XML Template Specification. Additional features can be added through the XML Tag API. The template is referenced by the input_xml keyword.
DOCX Stub¶
The DOCX stub is an empty template document that defines all of the styles and formatting. All the styles referenced explicitly in the XML Template, as well as the implicit default styles must exist in the stub. The stub is also responsible for setting up the page numbering, headers and footers. The stub is referenced by the input_docx keyword.
Keywords¶
The engine is configured through the IPC File and the IIF Files it includes. These files supply a set of keywords associated with values. There are two types of keywords: System Keywords and User-Defined Keywords.
Contents
System Keywords¶
System keywords configure the behavior of the engine and plugins. System keywords are conventionally identified by the lowercase_with_underscore naming scheme.
Most system keywords are optional, with sensible defaults used in case they are omitted. There are a few mandatory keywords that will result in an error if they are not supplied:
- data_config Required for all documents that use plugins.
- input_xml
- output_docx
The following is a complete listing of known system keywords. Custom tags and plugins may define additional keywords (or use existing ones for their own purposes).
caption_counter_depth
¶
The number of elements to include in the caption of a generated figure, table or heading reference before the object number. For example, say we have a figure that is the second figure under heading 1.2.3. Let’s also say that it is the fifth figure under heading 1.2 and the 10th in the first section. In that case, the following table shows the resulting reference captions for different values of this keyword:
caption_counter_depth | caption |
---|---|
None | 1.2.3-2 |
0 | 10 |
1 | 1-10 |
2 | 1.2-5 |
3+ | 1.2.3-2 |
This keyword is optional. It defaults to 1.
data_config
¶
The name of the IDC File that configures the entire Plugins Layer.
This keyword is mandatory if the XML Template specifies content generated by plugins, and completely ignored otherwise.
date
¶
This keyword has no special meaning. However, it is implicitly set to the
result of datetime.datetime.now
when headers and footers are
processed, if not set explicitly to something else. This makes it simpler to
include information about the time of generation into the
Headers and Footers. The implicitly-defined value is not
available at any point besides the final keyword replacement step for
Headers and Footers.
This keyword is optional.
file_level
¶
The minimum cutoff level to dump to the file. To dump everything to the file
use logging.NOTSET
, 0
, or 1
. The value can be a (case insensitive)
string level name, a number or one of the constants in the logging
module.
If log_file is missing, this level will be ignored and nothing will be written to a file.
This keyword is optional. It defaults to logging.NOTSET
.
See also
includes
¶
A sequence of include file names. Include files can only add new keywords to the existing configuration. They do not overwrite any keywords that are already set. It is therefore important that include files are loaded in breadth-first order in the order that they appear in the sequence.
This keyword is optional.
input_docx
¶
The name of the DOCX Stub to use as a style and formatting template in the Templates Layer. All the styles referenced explicitly by the XML Template and implicitly in the User Defaults File must be present in this file. This file must also contain all the required formatting for Headers and Footers.
This keyword is optional. The default is the empty document provided by python-docx.
input_xml
¶
The name of the XML Template file to use a content and layout template in the Templates Layer. This template must follow the specification laid out in XML Template Specification. It may contain additional tags, loaded through the tags mapping.
This keyword is mandatory.
log_file
¶
The name of the output file to write to. All messages with level greater than or equal to file_level will be written to the named file.
If boolean True
, a file with the same name as
output_docx, but with a .log
extension will be
created.
This keyword is optional. If omitted, a log file will not be written, and file_level is ignored.
See also
log_format
¶
A string that determines the contents of each line of the log file. The format
of this string is the same as for the fmt attribute of a
logging.Formatter
. It uses %
interpolation syntax, with all the
logging.LogRecord
attributes as valid keyword replacements.
This keyword is optional. If omitted, the log message will be formatted
according to '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
.
See also
log_images
¶
Whether or not to log images in separate files, in addition to inserting them into the document. Evaluated as a boolean, regardless of the actual type of the value. It is up to individual tag handlers to respect this setting. This setting is independent of the other logger settings.
This keyword is optional. If omitted, it normally defaults to falsy, but custom tags may chose to interpret it differently.
See also
log_stderr
¶
Whether or not to print log output to the standard error stream. Evaluated as a boolean, regardless of the actual type of the value. If truthy, all messages with level greater than or equal to stderr_level are written to standard error.
This keyword is optional. If omitted, it defaults to falsy, and stderr_level is ignored.
See also
log_stdout
¶
Whether or not to print log output to the standard output stream. Evaluated as a boolean, regardless of the actual type of the value. If truthy, all messages with level greater than or equal to stdout_level are written to standard error.
If falsy, stdout_level is ignored.
If log_stderr is set to truthy along with this keyword, then messages with a logging level greater than or equal to stderr_level will not be sent to standard output.
This keyword is optional. It defaults to truthy.
See also
output_docx
¶
The name of the generated document. If a file with the same name already exists, the program’s behavior is determined by the overwrite_output keyword.
This keyword is mandatory.
overwrite_output
¶
Determines how to handle the case where the file named by output_docx already exists. The following options are recognized:
'raise'
- Raise an error and abort.
'rename'
- Keep prompting the user for a new file name until they select one that does not already exist. A default suggestion is generated, which can be selected automatically.
'silent'
- Overwrite the existing file without further comment.
'warn'
- Overwrite the existing file, but with a warning.
Any other value will trigger a fatal error.
This keyword is optional. It defaults to 'raise'
.
stderr_level
¶
The minimum threshold for messages that go to the standard error stream. This
acts as an (exclusive) upper threshold for messages sent to the standard output
stream as well. This level does not affect the level being logged to the file.
The value can be a (case insensitive) string level name, a number or one of the
constants in the logging
module.
If log_stderr is missing or falsy, this level will be ignored.
This keyword is optional. It defaults to logging.ERROR
.
See also
stdout_level
¶
The minimum threshold for messages that go to the standard output stream. This
level does not affect the level being logged to the file. The value can be a
(case insensitive) string level name, a number or one of the constants in the
logging
module.
If log_stderr is truthy, stderr_level provides the exclusive upper threshold for messages sent to standard output.
If log_stdout is falsy, this level will be ignored.
This keyword is optional. It defaults to logging.WARNING
.
See also
tags
¶
Sets up user-defined tags for the XML Template. This is a mapping of tag names to user-defined Tag Descriptors. Values may be strings containing the fully-qualified names of the object to import, or the objects themselves. Both of the values in the following example are valid:
import my.custom.module
tags = {
'tag1': my.cusom.module.descriptor,
'tag2': 'my.custom.module.descriptor',
}
This keyword is optional.
User-Defined Keywords¶
User-defined keywords provide the data used to perform keyword replacements for the <kwd> tags in the XML Template. They provide the per-report configuration of the basic content. User-defined keywords are conventionally identified by a CamelCase naming scheme. While the naming is not strictly a requirement, any lowercase_and_underscore name is automatically reserved for use as a system keyword.
Computed Keywords¶
In addition to direct definition in the IPC File / IIF Files and insertion via the <kwd> tag, keywords can be computed through the <expr> tag in the XML Template. The namespace in which an <expr> tag is evaluated is the existing mapping of keywords defined up to that point. The result is a new user-defined keyword. System keywords placed in an <expr> tag are not guaranteed to work correctly. This form of computation is provided to de-clutter the IPC File, and avoid information redundancy in the frequently edited file.
XML Template Specification¶
The XML template used by Imprint contains the static portions of the text of the final document, along with all the placeholders for dynamically generated content.
There is no DTD or XMLNS for the template, for two reasons. All validation is done internally by the Imprint core, in a manner that is as lenient as possible. Any errors that can be forgiven, will be, with a warning and a logged message. Additionally, it is possible to use the XML Tag API to extend the capabilities of the core processor without requiring modification of a hard-coded standard.
The XML format used by Imprint does not allow namespaces. Namespace tags will be ignored with a warning, even if they are registered through the XML Tag API.
Warning
All tag and attribute names are case-sensitive. All builtin tags and attributes are lowercase. Names must appear in the XML exactly as shown in the spec.
Contents
Root¶
The file root is always the <imprint-template>
tag. That being said, there
is a proposal to make it configurable: Configurable XML Root Tag.
Attributes¶
Normally, each tag has a set of required and optional attributes. Omitting a required attribute immediately triggers a fatal error. Omitting an optional attribute just sets the default value when processing. Any extra attributes that are neither required nor optional are logged but otherwise completely ignored. In the tag descriptions below, all attributes are mandatory, unless suffixed by opt for “optional”.
In addition to the normal attributes that any tag may have, there are attributes that are processed by the engine itself. Currently, there is one such attribute:
role
¶
Define the role of a tag and immediately make it referenceable. The role is the name of another tag that is referenceable by design. Among the builtin tags, <figure>, <table>, and sometimes <par> are referenceable by design. For more details on references, see the relevant section in the Tag API.
Normally, referenceable tags identify the target with an id
attribute. Defining a role
on a custom tag therefore implies that it must
also have an id
attribute in that case. Among the builtin tags,
<segment-ref> is an exception, in that it requires either an
id
or a title
. A tag with role="par"
therefore does not require an
id
attribute. The rules for custom tags are defined similarly: the check for
target identification attributes depends on what the role supports.
Tags¶
<break>
¶
Insert a page-break. If placed in the middle of a run, this will be a true page break. Otherwise, this will be a section break that starts a new page.
Attributes¶
None
Content¶
<expr>
¶
Evaluate a Python expression and create a new keyword. This tag can appear anywhere in the document. It temporarily suspends normal processing. Any text inside this tag will be evaluated as a Python expression, and the result will be assigned to the named keyword. All existing keywords, including those from prior <expr> tags, are available in the evauation namespace.
Keywords computed in this manner are treated the same as User-Defined Keywords and will be effective immediately as soon as the closing tag is reached, but not before. It is therefore common practive to put of all the expressions into the beginning of the XML Template.
The purpose of this tag is to abstract away common boiler-plate keywords that depend entirely on other keywords into the XML Template to avoid as much redundancy as possible.
System Keywords should never be set with this tag. System values may be used before the XML file is read, and may therefore not work as intended for this and other reasons.
Warning
This tag runs arbitrary Python code, with direct access to the keyword definitions. Avoid making assignments within the tag itself (even implicit ones) unless you really know what you are doing!
Warning
Any coding errors in the content of this tag will cause a fatal error.
Attributes¶
- name : Python Identifier
- The name of the new keyword to create.
- importsopt : List of module names
- A space-separated list of modules to import before evaluating the expression in the tag. Failed imports will be logged as an error.
<figure>
¶
Generates a figure using the selected handler, and insert it into the document. If Image Logging is enabled, a separate file with the image will be generated as well.
Figures are referenceable through the <figure-ref> tag.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the figure. The name must appear in the IDC File file. This is also the ID used by the <figure-ref> tag to link back to this tag.
- handler :
str
- The full name of the figure handler class that will generate the content.
- styleopt : Character Style
- The name of the style of the run containing the figure. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
- pstyleopt : Paragraph Style
- The name of the style of the paragraph containing the figure. Must be defined in the DOCX Stub and be a paragraph style.
- widthopt :
int
+{'in', 'px', 'cm', 'mm', 'pt', 'emu'}
- The width of the figure. Units are optional, and default to inches
(
'in'
). Suffixes can be separated from the number by optional whitespace. - heightopt :
int
+{'in', 'px', 'cm', 'mm', 'pt', 'emu'}
- The height of the figure. Units are optional, and default to inches
(
'in'
). Suffixes can be separated from the number by optional whitespace.
The attributes handler
, style
, pstyle
, width
and height
can
be overriden by keys with the same name in the Data Configuration
for the figure. If neither width
nor height
are specified, the figure
will be inserted as-is. If only one of them is specified, the figure will be
scaled proportionally.
Content¶
<figure-ref>
¶
Insert a reference to a <figure>, or another tag playing the role of a <figure>.
The reference will look something like Figure 1.2-1, depending on the configured heading depth and separators.
Attributes¶
- id : Python Identifier
- The
id
of the corresponding <figure>.
Content¶
<kwd>
¶
Perform a keyword replacement. Keywords are defined as in the IPC File. The entire tag is replaced with the value of the keyword.
Attributes¶
- name : Python Identifier
- The name of the keyword to replace.
- formatopt :
format_spec
- A format specification that can be used to convert the value into a string.
Content¶
<latex>
¶
Insert a LaTeX formula into the document as an image. This tag is only available if the appropriate dependencies are installed.
Equations interrupt the current run if their run style does not match the style of the current run.
Attributes¶
- styleopt : Character Style
- The name of the style of the run containing the equation. The run style can be used to position the image relative to the normal flow of text. Must be defined in the DOCX Stub and be a character style.
- pstyleopt : Paragraph Style
- The name of the style to use for the equation’s paragraph, if it appears outside of an existing paragraph. Ignored if this tag appears inside a <par> tag. If used, must be defined in the DOCX Stub and be a paragraph style.
- dpiopt :
int
- The DPI of the output image. Defaults to 96.
- formatopt : Image Format
- The output format, defaults to
'jpg'
. - sizeopt :
int
orNone
- The text size, in points, used to render the equation. The default is to let LaTeX decide.
<n>
¶
Insert a line-break into the document. Line breaks only make sense within a paragraph, so this tag is ignored with a warning outside <par> tags.
Normally, this tag should appear inside a <run>. If not, the line break will be appended to the previous <run> in the current paragraph, or a new run will be created for it if it appears as the first tag.
Attributes¶
None
Content¶
<par>
¶
Contains a paragraph of text. A paragraph is a collection of runs of differently formatted text, as well as some other elements. A paragraph can be styled with a paragraph-level style. Runs within a paragraph can have additional character-level styling that combines with or overrides the paragraph style.
Paragraphs should appear immediately under the document root to avoid warnings. Paragraphs that do not follow this (e.g., by being nested within each other), will be broken up unpredictably with a slew of warnings.
Paragraphs are automatically referenceable if they have a heading style.
Non-heading paragraphs must explicitly declare their
role to be par
just like any non-par
tag
posing as a heading. References can be made using the
<segment-ref> tag.
Attributes¶
- styleopt : Paragraph Style
- The name of the style to use for this paragraph. Must be defined in the DOCX Stub and be a paragraph style.
- idopt : Reference ID
- The ID of this paragraph, if it is being used as the target of a
<segment-ref>. If an ID is not supplied, the segment can
be referenced only through the
title
attribute of the <segment-ref>. IDs will be ignored for any non-heading paragraph without an explicit role. - listopt : {
continued
,bulleted
,numbered
} If this paragraph is a list item, set this attribute to one of the allowed values. Options are case insensitive, and can be truncated:
bullet
andNUM
are both examples of valid options as well.This attribute is required to make a list item. If it is missing, the paragraph will not be bulleted/numbered, even if a list style is applied to it.
continued
will continue the style/numbering of the previous list item, no matter how many other items were inserted in between. The other options always start a new list with the default style determined by the list type.- list-levelopt :
int
- An integer between zero and infinity specifying the depth of the current
list item. Numbers are generated automatically. If the paragraph
immediately preceding this one is a list item, the depth is preserved by
default (as is the style). Otherwise, the defalt depth for a new list is 1.
Missing depth-levels get filled in automatically if the depth jumps by an
increment of more than 1. Ignored if
list
is not set.
Content¶
Tags only. Any spurious text that is found will be placed into a run with the default style, along with a warning.
<run>
¶
Contains a run of text, which is normally just characters, with optional keyword replacements. Runs are aggregated into <par> tags. A run can have a character-level style independent from all the other runs in the paragraph.
Attributes¶
- styleopt : Character Style
- The name of the style to use for this run of characters. Must be defined in the DOCX Stub and be a character style.
Content¶
Text and tags. Runs should always appear directly inside a <par> tag. Nested <run> will cause a fatal error. Runs outside a <par> tag will cause a warning and an implicit paragraph to be placed around them. Most other tags are allowed in a run, but may interrupt the run, to be resumed after with the same character style.
<section>
¶
Introduces a new section into the document. Sections define the page parameters in the document. This tag begins a new section (rather than enclosing a section), which will continue until the next <section> tag or the end of the document.
Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed on the following page with the same styles.
Attributes¶
- orientationopt : {
'Portrait'
,'Landscape'
} - The page orientation of this section. Values are case-insensitive.
The supported attributes for this tag may be expanded in the future.
Content¶
<segment-ref>
¶
Insert a reference to a <par> with a heading style, or another tag playing the role of a heading <par>.
The reference will look something like Section 1.2-1: Title, depending on the configured prefix, heading depth and separators.
Attributes¶
- idopt : Python Identifier
- The
id
of the corresponding <par>. - titleopt : String
- The actual text of the corresponding <par>.
One of id
and title
must be present. If both are present, they must
refer to the same target, or a fatal error will occur.
Content¶
<skip>
¶
Marks a piece of text for further investigation, without any other side-effects.
The only purpose of this tag is to provide better logging of marked text, and to suppress warnings when it occurs.
Attributes¶
None
Content¶
Text and tags.
<string>
¶
Generates a dynamic string based on the selected handler. Strings are expected to appear within a <run>. Any other location will generate a warning.
This tag is similar to <kwd>, except that it creates content based on a dynamic runtime configuration rather than just the static mapping of keywords.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the string. The name must appear in the IDC File file.
- handler :
str
- The full name of the string handler class that will generate the content.
Content¶
<table>
¶
Generates a table using the selected handler. Tables are constructed directly in the document, so any errors generated by the handler will result in a table stub along with the alt-text being placed in the document.
Tables are stand-alone entities. If this tag appears inside a <run> or <par> tag, a warning will be logged, and the paragraph and character styles will be resumed as necessary after the table.
Tables are referenceable through the <table-ref> tag.
Attributes¶
- id : Python Identifier
- The name of the Data Configuration dictionary for the table. The name must appear in the IDC File file. This is also the ID used by the <table-ref> tag to link back to this tag.
- handler :
str
- The full name of the table handler class that will generate the content.
- styleopt : dev/analysis/features/styles/table-style
- The name of the style to use for this table. Must be defined in the DOCX Stub and be a table style.
Content¶
<table-ref>
¶
Insert a reference to a <table>, or another tag playing the role of a <table>.
The reference will look something like Table 1.2-1, depending on the configured heading depth and separators.
Attributes¶
- id : Python Identifier
- The
id
of the corresponding <table>.
Content¶
<toc>
¶
Insert a Table of Contents (TOC) into the document. Must appear outside any <par>, or a warning will be issued, and any surrounding run and paragraph will be broken, to be resumed after the TOC with the same styles.
Attributes¶
- minopt : int
- The minimum heading level that the TOC supports. Defaults to
1
. - maxopt : int
- The maximum heading level that the TOC supports. Defaults to
3
. - styleopt : Paragraph Style
The name of the style to use for the heading paragraph. Must be defined in the DOCX Stub and be a paragraph style.
The name of the style of the heading within the TOC.
Extensions¶
Additional tags may be registered through the XML Tag API. New tags may not conflict with existing names, but otherwise have no real restrictions.
Glossary¶
The following terms are used frequently throughout this document:
- error
- A logged message that means that the current operation was aborted. The remainder of the document will still be processed.
- fatal error
- An error that is unrecoverable. In addition to being logged and aborting the current operation, the remainder of the document will not be processed.
- Image Format
- A short string indicating an image format for converstion tools. Common
formats include
'jpg'
,'png'
,'bmp'
, etc. Most imprint features will default to either JPG or PNG format. - No Content
- Nesting a tag or placing text in a tag that has this content description
will cause a fatal error. The tag must effectively be of the form
<tag/>
or<tag></tag>
. Whitespace is not considered to be content, so it may be present between an opening and closing tag. - referenceable
- A tag is referenceable if it has a role attribute, of if it has reference functionality built into it. For more information on references, see the corresponding section in the tag API description: References.
- Text Only
- Nesting a tag in a tag that has this content description will cause a fatal error.
Plugin API¶
All complex custom content in Imprint is generated by the
plugins. Plugins are implemented by
special configurable callable
objects called handlers that
follow a specific interface, which allows them to be referenced by the
appropriate tags in the XML Template.
Three types of content are supported out of the box: Figures, Tables and Strings. Each type of plugin accepts a mapping of keywords from the IPC File (and the <expr> tags in the XML Template), and a dictionary of data configuration values from the IDC File that defines the behavior of the plugin. Beyond that, each type of handler has a different interface.
In fact, any custom TagDescriptor
may define its
own plugin interface. What makes a tag pluggable is its reliance on a function
that accepts a data configuration. This technically makes the plugin API an
implementation of a very distinct part of the XML Tag API.
Contents
Data Configuration¶
The data configuration is the second argument to every handler. The data
configuration is a mapping set for every plugin in the
IDC File. The name of the configuration dictionary is in the
id attribute of the corresponding
<figure>, <table> or
<string> placeholder tag in the XML Template.
Custom tags may be registered as configurable plugins by setting the
data_config
attribute of their
TagDescriptor
.
Data configuration values can contain any type of values, as long as they are
meaningful to the plugin. Plugins may require some keys to be present in the
configuration, and should raise a KnownError
, optionally
caused by a KeyError
in response to missing keys. Most plugins will
require some sort of data source, such as a file name, but again, this is not
required.
Some values are special, in that they can override XML attibutes used by the
TagDescriptor
. In particular, the handler
attribute can be overridden by a key with the similar name in the
IDC File. Overridable values are noted for each
builtin tag in the XML Template Specification.
Handlers¶
Handlers are named by the handler
attribute of the corresponding
<figure>, <table> or
<string> placeholder tag in the XML Template.
The exact class name (including package) is searched for the handler. If not
found, a prefix of imprint.handlers
is prepended to the nominal package
name.
The handler can be overridden in the Data Configuration
dictionary. Normally, all configuration keys are interpreted directly by the
handler. However, a special handler
key will processed before a handler is
found, and can override the setting in the XML. This mechanism is provided by
each of the Tag Descriptors for Figures,
Tables and Strings. It allows for more flexible
debugging, and modification of existing templates. New
Tag Descriptors can use the get_handler
function to implement the same functionality, although it is not stritcly
required.
Figures¶
Some built-in figure handler examples can be found in
imprint.handlers.figure
.
Handler Signature¶
-
handler
(config, kwds[, output])¶ Generate an image based on the Data Configuration. If an output is specified, it will be a string or file-like. A string indicates an output file name, which the handler may modify and return. A file-like can be assumed to be open for binary writing, with random access enabled. It should be rewound before being returned.
Parameters: - config (dict) – The Data Configuration for the figure.
- kwds (dict) – The keyword dictionary for the figure.
- output (str or file-like or None) – The name of the output file, or the output file to save the
figure to. If omitted, the output must go to an in-memory file-like
object like
io.BytesIO
. The handler may determine the output format based on the file extension, but this is not required. Each handler should have a default format for extensionless files and omitted output.
Returns: Either the actual output file name, or an in-memory file-like object, rewound to the beginning, containing the image. A string output will not necessarily be the input file name. It may, for example, have an extension appended to it. A
None
return value indicates an internal non-fatal error.Return type: str or file-like
Tables¶
Some built-in table handler examples can be found in
imprint.handlers.table
.
Handler Signature¶
-
handler
(config, kwds, doc, style, *, image_log_name=None) Generate a table based on the Data Configuration. The handler is responsible for generating a table of the correct size and styling it properly based on the
style
parameter.This type of plugin is expected to have no return value.
Parameters: - config (dict) – The Data Configuration for the table.
- kwds (dict) – The keyword dictionary for the table.
- doc (docx.document.Document) – The document to insert the table into.
The handler is responsible for invoking the
add_table
method. - style (str) – The name of the style to apply to the generated object.
- image_log_name (str, Path-like or None) – The name of the image log to use if table data is to be logged. If
log_images is off, this will be
None
. May be completely ignored by the handler if impractical or inappropriate to implement. The file name, if supplied, is provided without any extension.
Strings¶
Some built-in string handler examples can be found in
imprint.handlers.string
.
Handler Signature¶
-
handler
(config, kwds) Generate a string based on Data Configuration.
Parameters: - config (dict) – The Data Configuration for the string.
- kwds (dict) – The keyword dictionary for the string.
Returns: The newly created string. A
None
return value indicates an internal non-fatal error.Return type:
Errors¶
Since plugins implement a subset of the tag-processing functionality, the same rules apply to plugin errors at to generic tag errors. See Errors in the XML Tag API section.
Builtin Plugins¶
Imprint is packaged with a small number of pre-defined builtin plugin Handlers for general purpose use. In addition to being useful on their own, these plugins provide a starting point for advanced users wishing to write their own. Handlers are grouped into sub-packages according to the tag they support.
Figures¶
imprint.handlers.figure
is the root package for built-in
Handlers for inserting figures into a document.
All the handlers in this module are compatible with the plugin interface used by the <figure> tag. This package exposes all the handlers defined in its submodules.
-
imprint.handlers.figure.
ImageFile
(config, kwds, output=None)¶ Generate python-docx compatible images from image files.
Copy image files as-is, or load them into memory. Output must be to a file of the same type as the input (except for PDFs): no conversion is done, only direct copy. PDFs (identified by the
'.pdf'
extension) get special handling to convert them into usable images.The following Data Configuration keys are used:
- file
- The (mandatory) file name containing the image.
- formatted
- Whether or not
file
is a format string that has keyword replacements in it. Defaults to truthy. Set to falsy if the name contains random opening braces.
Notes
Using this plugin with PDF files requires the poppler library mentioned in the External Programs.
Submodules¶
imprint.handlers.figure.images
contains basic built-in
Handlers for inserting images into a document. All the
handlers in this module are compatible with the
plugin interface used by the
<figure> tag.
Strings¶
imprint.handlers.string
is the root package for built-in
Handlers for inserting strings into a document.
All the handlers in this module are compatible with the plugin interface used by the <string> tag. This package exposes all the handlers defined in its submodules.
-
imprint.handlers.string.
TextFile
(config, kwds)¶ Generate a string directly from the contents of a text file.
Text files are inserted literally, with no styling information beyond that of the <string> tag that triggered the plugin. Newlines are not preserved.
The following Data Configuration keys are used:
- file
- The (mandatory) file name.
- formatted
- Whether or not
file
is a format string that has keyword replacements in it. Defaults to truthy. Set to falsy if the name contains random opening braces.
Submodules¶
imprint.handlers.string.strings
contains the basic built-in
Handlers for inserting strings into a document. All the
handlers in this module are compatible with the
plugin interface used by the
<string> tag.
Utilities¶
imprint.handlers.utilities
contains common utilities for
handlers. Users wishing to write their own handlers may want to use
these functions to facilitate a uniform interface. Existing handlers in
this package use these functions as well.
-
imprint.handlers.utilities.
get_key
(config, kwds, key, default=None, formatted='formatted', missing_ok=True)¶ Retreive the value of key from the mapping config.
If key does not exist in config, return default instead.
If formatted is a string, it determines the key name that determines whether key is a format string or not (default is yes). Otherwise, it is interpreted as a boolean directly.
Parameters: - config (dict) – The Data Configuration dictionary to search.
- kwds (dict) – The Keywords dictionary to use for replacements if formatted turns out to be truthy.
- key (str) – The name of the key in config containing the required value.
- default – The value to return if key is missing from config.
- formatted (str or bool) – Either the name of the key to get the formatted flag from (if a string), or the flag itself. In either case, ignored if the value is not a string.
- missing_ok (bool) – If truthy, missing values are replaced by default. Otherwise
a
KeyError
is raised.
Returns: - The value in config associated with key, optionally formatted
- with kwds.
XML Tag API¶
The Imprint engine comes with a complete set
of processors for the tags specified in the XML Template Specification. However, additional
tags may be necessary for highly customized applications, so an API exists for
defining and registering new tags. The API is defined in the
imprint.core.tags
module. Example usage can be found in the
Writing Custom Tags tutorial.
Contents
Tag Descriptors¶
The tag API revolves around the TagDescriptor
class. The class can
be extended directly, or instantiated through a delegate object that fulfills
the necessary duck-type API. Objects contain a set of attributes and two
callbacks that define how to handle XML tags of a given type. All the elements
are optional and have sensible default values.
Any registered object will be viewed through TagDescriptor.wrap
, so
it is not necessary to extend or instantiate TagDescriptor
to
create a working tag descriptor.
Errors¶
Tag descriptors may raise any type of error they deem necessary in their
start
and end
methods. Most
classes of errors will be logged and cause the application to abort. However,
two special classes of errors will not cause a fatal crash:
KnownError
is used to flag known conditions that can be handled gracefully by the tag.OSError
. Specifically, theFileNotFoundError
andPermissionError
subclasses are deemed to be “known errors”. If they represent a fatal condition, they should be wrapped in another exception type.
Any plugins with a dynamic Data Configuration will generally receive an alt-text placeholder where the content would normally go instead of completely aborting.
-
exception
imprint.core.
KnownError
¶ A custom exception class that is used by the engine to indicate that a tag or plugin handler exited for a known reason.
In cases where this exception is logged, the message is printed without a stack trace.
Configuration¶
Tags have two types of configuration available to them. Static configuration for a given XML Template is provided through the tag attributes in the XML file. Dynamic configuration through the IDC File can be enabled to provide per-document fine-tuning.
XML Attributes¶
XML attributes are supplied to the start
and
end
methods of a TagDescriptor
as the
second argument. The inputs are presented to both methods as a vanilla
dict
. The dictionary are meant to be treated as read-only, but this
is not a requirement, meaning that technically start
can modify what end
sees. The dictionary is filtered
to exclude any attributes that are not listed in the
required
and optional
elements of the TagDescriptor
.
Data Configuration¶
For some types of content, static configuration is not enough. To allow
per-document configurations, a TagDescriptor
must define a
non-None
data_config
attribute. This
attribute gives the name of the dictionary to extract from the
IDC File.
start
and end
methods of a
TagDescriptor
with the data_config
attribute set will receive an additional input argument containing the
Data Configuration loaded from the IDC File.
The data configuration can override some of the static XML Attributes of a tag. For built-in tags, the XML Template Specification notes which attributes can be overriden. Built-in tags that support dynamic configuration are <figure>, <table> and <string>.
All built-in tags that support dynamic configuration also support a type of plugin, but this is not a requirement for custom tags.
References¶
A TagDescriptor
is referenceable if it has a
non-None
reference
. A reference made to a
tag will be substituted by the appropriate reference text. By default reference
tags have the target tag name with “-ref” appended:
<figure-ref> references <figure>,
<table-ref> references <table>. A notable
exception is <segment-ref>, which references paragraphs
(<par> tags), but only ones that have a heading style.
References are usually identified by a required id
attribute. Segments can
also be identified by the title of the segment, which is the aggressively
trimmed collection of all the text in the text in the paragraph. For example,
the title of the following XML snippet would be 'Example Heading'
:
<par style="Heading 3">
<run style="Default Paragraph Font">
Example
Heading
</run>
</par>
<segment-ref> tags can therefore identify their target with
either a id
or title
attribute. User-defined tags can implement their
own customized rules for identiying targets.
Roles¶
For the purpose of creating references, any tag may impersonate, or play the role of, any other tag using a special role attribute. This attribute is implicitly optional for every tag. It is interpreted directly by the parsers in the Engine Layer to determine the type of reference that a tag will represent.
For example, a <table> tag (or any other tag for that
matter), which has role="figure"
must be referenced by a
<figure-ref> tag, not a <table-ref> tag,
in the XML Template. That table will be a figure for the purposes
of the document in question.
Any arbitrary tag can be referenced the same way with the appropriate role. Usually, such a referenceable tag will be styled appropriately, and will have the headings, captions, etc. appropriate for its role rather than its nominal tag.
A specific case is arbitrary tags that have a <par> role.
Such tags are automatically referenceable by <segment-ref>.
Their entire contents will be treated as the title of the heading, so the
par
role must be used carefully.
Registering New Tags¶
Once a TagDescriptor
or a delegate object has been constructed,
there are two main ways to get Imprint to use the descriptor for actual tag
processing.
Via Configuration¶
In the normal course of things, Imprint will not automatically import unspecified user-defined modules. To let it know where to find tag extensions, add them by name or by reference to the IPC File to the mapping in the tags keyword. This will automatically import all the necessary modules, and register the custom descriptor under the requested tag name.
Programatically¶
Under the hood, tags are registered with the Imprint core simply by adding them
to tag_registry
:
tag_registry[name] = descriptor
The registry is a special mapping that ensures that name
is a string not
representing an existing tag. While it is not possible to remove or overwrite
existing tags, the same descriptor can be registered under multiple names.
This method is useful mostly to users wishing to write a custom driver program for the engine. Under normal circumstances, the configuration solution will be more suitable.
Engine State¶
Both callbacks of a TagDescriptor
accept an
EngineState
object as their first argument,
which supports stateful tag processing. The engine state provides a mutable
container for arbitrary attributes. Each TagDescriptor
can add,
remove and modify attributes of the state object to communicate with itself,
the engine, and other tags.
As a rule, objects should prefer to delete state attributes rather than setting
them to None
. This meshes well with the fact that
EngineState
provides a containment check. For
example, to check if the parser is in the middle of a run of text, descriptors
should check
if 'run' in state: ...
The built-in tags and the engine use a set of attributes and methods to operate
properly. Modifying these predefined attributes in a way other than explicitly
documented will almost inevitably lead to unexpected behavior. Properties are
used instead of simple attributes in a few cases to provide sanity checks for
the supported modifications. Custom tags can add, remove and modify any
additional attributes they choose. The full list of built-in attributes is
available in the EngineState
documentation.
The API¶
The imprint.core
package contains the Imprint
Engine Layer. The tags
and
state
modules implement most of the
functionality useful to end-users through the public XML Tag API. The
parsers
and utilities
contain the Internal API.
The imprint.core.tags
module implments the base
XML Tag API, as well as the all the predefined
Built-in Tag Descriptors and Reference Descriptors.
The following members are used to construct and register new tags:
A limited mapping type that contains all the currently registered tag descriptors.
Registering a new descriptor is as easy as doing:
tag_registry[name] = descriptor
The registry is a restricted mapping type that supports adding new elements only if they are not already registered. Existing elements can not be deleted. Deletion operations will raise a
TypeError
, while overwriting existing keys will raise aKeyError
. Aside from that, all operations supported bydict
are allowed (including things likeupdate
).Any tag that is referenceable by design (has a valid
reference
attribute) will have theReferenceDescriptor
’s registration hook invoked after the tag-proper is registered.The built-in tags are registered when the current module is imported.
The basis of the tag API.
Instances of this class contain the information required to process a custom tag. They must contain all of the attributes listed below, with the expected types. The elements in
tag_registry
may be delegate objects that supply only part of the attibute set. In that case, they are wrapped in a proxy as needed at runtime, never up-front. The reason for this is twofold:- There may be stateful objects registered for multiple tags, and wrapping in a proxy will not allow the tags to share state. This would not be a problem, except it would be unexpected behavior.
- Some of the attributes may be dynamic properties (or other descriptors). Fixing the value once would completely defeat such behavior.
Creating an occasional wrapper around a delegate is not expected to be particularly expensive, even if it had to be done for every tag encountered in the XML file. On the other hand, it allows for some very flexible behaviors. At the same time, very few instances of wrapping should occur, since most tags will be implemented by extending this class and implementing it properly. The
wrap
method ensures that all extensions are passed through as-is.All the Built-in Tag Descriptors are instances of children of this class.
A tri-state
bool
flag indicating whether the tag is allowed/expected to have textual content or not. The values are interpreted as follows:- None
- The tag may not have any content. It must be of the form
<tag/>
or<tag><otherTag>...</otherTag></tag>
. Anything else will raise a fatal error. Iftags
is set toFalse
, only the former form is allowed. - False
- The tag should not have content, but content will not raise an error. A warning will be raised instead.
- True
- The tag is expected to have content, but the content may be empty.
Any value is allowed in a delegate. If defined, the value will be converted to
bool
if it is notNone
. Defaults toNone
if not defined.
A
bool
indicating whether or not nested tags are allowed within this one.Any value is allowed in a delegate. If defined, the value will be converted to
bool
. Defaults toTrue
if not defined.
A
tuple
of strings containing the name of required tag attributes. A tag encountered without all of these attributes will raise an error.In a delegate, this may be a single string, an iterable of strings,
None
or simply omitted. Every element of an iterable must be a string, or aTypeError
is raised immediately during construction. Defaults to an emptytuple
if not defined.
A dictionary mapping the names of optional attributes to their default values. Optional attributes are ones that are expected to be present in processing, but have sensible defaults that can be used, meaning that they do not have to be specified explicitly in the XML Template.
In a delegate, this may be any mapping type, an iterable of strings, a single string,
None
or simply omitted. In the case of an iterable or individual string, all the defaults will beNone
. Iterables and mapping keys must be strings, or aTypeError
will be raised during contruction. Defaults to an emptydict
if not defined.
The name of the attribute containing the data configuration name for the tag. This should only be provided for tags that require Data Configuration. If provided, this tag will automatically be added to the
required
sequence.In a delegate, this object must be an instance of
str
orNone
. Defaults toNone
if not defined.
A
ReferenceDescriptor
that is only present if this type of tag can be the target of a reference.Examples of referrable built-in tags are <figure>, <table> and sometimes <par>. Referrable tags can have an optional
role
attribute that changes the type of reference they represent. See the Roles description for more information.In a delegate, this object must be an instance of
ReferenceDescriptor
orNone
. Defaults toNone
if not defined.
After completion, this instance has all of the required attributes defined in the delegate, wrapped in the required types.
A reference to the delegate object is not retained. This method can be invoked multiple times. It updates the current descriptor with the attributes of the delegate, leaving undefined attributes in the delegate untouched.
Create an empty instance, with all required attributes set to default values.
This method is provided to allow bypassing the default
__init__
in child classes. All arguments are ignored.
Each descriptor should provide a method with this signature to process closing tags.
If implemented, this method must accept the Engine State, a tag name and a
dict
of attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag. The attributes are the same as those passed tostart
, barring any modifications made instart
.Descriptors that have a non-
None
data_config
attribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Each descriptor should provide a method with this signature to process opening tags.
If implemented, this method must accept the Engine State, a tag name and a
dict
of attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag.Descriptors that have a non-
None
data_config
attribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Construct a proxy from the descriptor if it isn’t already one.
This method is provided so that when
TagDescriptor
objects are implemented properly up front, they do not need to be wrapped in an additional layer.If the input is a delegate, the return value will always be of the type that this method was invoked on. However, the type check will always be done agains the base
TagDescriptor
class.
Bases:
imprint.core.tags.TagDescriptor
The base class of all the built-in
TagDescriptor
implementations.Custom tag implementations are welcome to use this class as a base instead of a raw
TagDescriptor
.Updates the required fields with the keywords that are passed in.
If no delegate object (or
None
) is supplied, bypass the default constructor (seeTagDescriptor.__new__
). kwargs will override any defaults and attributes set by a delegate.
Built-in Tag Descriptors¶
The existing tag descriptors implement the XML Template Specification:
Bases:
imprint.core.tags.BuiltinTag
Implements the <break> tag.
Insert a page break into the document.
Bases:
imprint.core.tags.BuiltinTag
Implements the <expr> tag.
Warning
This descriptor uses
eval
to execute arbitrary code and assign it to a new keyword. Use with extreme caution!Evaluate the expression found inside the tag, and add a new entry to the
state
’skeywords
.The
content_stack
will be popped.All errors in importing and evaluation will be propagated up and will terminate the parser.
Begin a new expression.
This just pushes a new
content_stack
entry in the state. All content until the closing tag will be evaluated as a set of Python statements.
Bases:
imprint.core.tags.BuiltinTag
Implements the <figure> tag.
Generate and insert a figure based on the selected handler.
Figures can appear in a run, a paragraph, or on their own.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <kwd> tag.
Find the value of the keyword in the state’s
keywords
and place it into the currentcontent
.If the keyword is not found, a
KeyError
will be raised. If the tag has aformat
attribute, it is interpreted as aformat_spec
, and used to convert the value. If the attribute is not present, the value is converted with a simple call tostr
.
Bases:
imprint.core.tags.BuiltinTag
Implements the <latex> tag.
Convert the equation in the text of the current tag into an image using
haggis.latex_util.render_latex
, and insert the image into the parent tag.The parent can be a run or a paragraph. If the requested run style does not match the current run, the current run will be interrupted by a run containing a new picture with the requested style, and resumed afterwards. If there is no run to begin with, a new run will be created, but not stored in the
run
attribute of the state.Formulas are rendered at 96dpi in JPEG format by default.
Begin a new LaTeX formula.
Just push a new
content_stack
entry into state. All content until the closing tag is evaluated as a LaTeX document.
Bases:
imprint.core.tags.BuiltinTag
Implements the <n> tag.
Add a line break to the current run.
If not inside a run, append the break to the last run. Make a new run only at the start of a paragraph. Ignore with a warning outside of a paragraph.
Bases:
imprint.core.tags.BuiltinTag
Implements the <par> tag.
Validate the
list
attribute that is found.Log an error if the attribute is invalid, but do not terminate processing. The attribute is simply ignored if the list is neither numbered, bulleted nor continued.
Return the type normalized to a
ListType
, orNone
if not a list item. If the type is valid, andlist-level
is set, it is converted to an integer.
Compute the paragraph style based on whether an explicit style is set in the attributes, and whether or not the paragraph is a list.
- If an explicit style is requested, return it. Otherwise:
- If the paragraph is not a list, return the default paragraph style. Otherwise:
- If the previous paragraph is a list item in the same list
(i.e., the current
list-level
attribute is non-zero), return the style of the previous paragraph. Otherwise: - Return the default list item style.
Parameters: - state (EngineState) – The state is used to check for the previous item’s style in case #3.
- attr (dict) – The tag attributes, used to check for an explicitly set
style
as well as for a style reset withlist-level = 0
. - list_type (ListType or None) – The type of the list, if a list at all, as returned by
check_list
.
Terminate the current paragraph.
See
end_paragraph
inEngineState
.
Terminate any existing paragraph, flush all text and start a new paragraph.
If the new paragraph is a list item, add the necessary metadata to it.
Issue a warning if an existing paragraph is found.
Bases:
imprint.core.tags.BuiltinTag
Implements the <run> tag.
Place any remaining text into the current run, and remove
run
attribute ofstate
.
Create a new run, ensuring that there is a paragraph to go with it.
Creating a run outside a paragraph raises a warning and creates a paragraph with a default style. See
imprint.core.state.EngineState.new_run
.
Bases:
imprint.core.tags.BuiltinTag
Implements the <section> tag.
Begin a new section in the document, optionally altering the page orientation.
Bases:
imprint.core.tags.BuiltinTag
Implements the <skip> tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <string> tag.
Generate a string based on the appropriate handler.
If the
log_images
key is set to a truthy value instate
.keywords
, the content will also be dumped to a file.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <table> tag.
Generate and inserts a table based on the selected handler.
The handler creates the table directly in the document (unlike for figures, where only the final product is inserted). Any error that occurs mid-processing leaves a stub table in the document in addition to the automatically-inserted alt-text.
Tables appear on their own, outside any paragraph or run, so if a table is nested in a run or paragraph, a warning will be issued. Any interrupted run or paragraph resumes after the table with their prior styles.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <toc> tag.
Terminate and insert the TOC.
Gather any text that has been acquired into the heading, which will be a separate pargraph preceding the TOC.
If the TOC interrupted an existing paragraph, a new paragraph will be resumed with the same style as the original. If a run style is present as well, a run will be recreated too.
Create a new TOC.
Log a warning if the tag appears within a paragraph. Truncate the paragraph, and resum with the prior style. The same happens to the current run, if there is one.
Bases:
imprint.core.tags.BuiltinTag
Implements the <figure-ref> and <table-ref> tags.
This processor is not registered explicitly. It gets added by all of the target tags that use it as part of their registration process. Registering this processor under a name that does not end in
'-ref'
will lead to a runtime error inresolve
.Insert a string with the specified reference into the current
content
.
Returns a quasi-singleton instance of the current class.
This instance is not exposed directly, but it is registered by the built-in referencable tags.
Overridable operation for fetching and logging the reference that is to be inserted.
The default is to look up the reference by
'id'
in theimprint.core.state.EngineState
’s.references
.Used by the default implementation of
end
.
Bases:
imprint.core.tags.ReferenceProcessor
Implements the <segment-ref> tag.
This is a special case of
ReferenceProcessor
that allows access by bothtitle
andid
. It’s references always resolve to a <par> tag, or a tag playing that role.Resolve a segment reference be either text or ID.
Either the
id
ortitle
tag attribute must be present. If both are present, they must resolve to the same heading in the document or an error is raised.
Reference Descriptors¶
Defines the process for creating References and using them through the appropriate tag.
References are made by processing the XML Template and mapping out any referenceable tags using the
start
andend
methods. In the default implementation, the reference text is created by themake_reference
method, invoked fromend
.start
andend
return a boolean value to allow custom tags to be processed selectively. A return value ofFalse
from either method means that that the specific instance of the tag being processed is not a valid reference target. Normally both methods always returnTrue
, but for the builtin <par> tag, for example, an exception must be made.References are placed into the document by a special
TagDescriptor
, which is generally registered along with the parent tag that contains aReferenceDescriptor
using theregister
method.Current references are purely textual, rather having a dynamic field assigned to them. This is still a work in progress.
The prefix that normally gets prepended to the reference text. Used by
make_reference
to construct the output string. Extensions are welcome to ignore this attribute.
A string or iterable of strings that lists the attributes that are used to identify target for this reference type. The attribute may be either required or optional for the target tag, but it must be recognized either way. This attribute is used to check for attributes on tags with a non-default role. Defaults to
'id'
.
Process the closing tag for a referencable tag.
The default is to add the reference to the appropriate map in
references
by ID, based on therole
, and log the operation. The attributeid
is required.The actual reference is created by
make_reference
.Returns
True
if the tag is definitely a reference target,False
if not.
-
identifiers
Ensure that
identifiers
is read-only.
Returns a string refering to the specified tag in the specified role.
Keep in mind that the
ReferenceDescriptor
is selected based on the role, not necessarily the tag name. Therefore, therole
argument should always be the “computed” role: the name of the tag should be overriden by the value of the attribute, if it was specified.
A registration hook that is invoked when the parent
TagDescriptor
is registered.The default implementation registers an additional
TagDescriptor
under the namename + '-ref'
, which replaces the<name-ref/>
tag with the formatted reference. SeeReferenceProcessor
.Parameters: - registry – The tag registry that the parent
TagDescriptor
is being inserted into. Seetag_registry
for details on the interface. - name (str) – The name under which the parent tag is being registered.
- descriptor – The parent object being registered, not necessarily a
TagDescriptor
. TheTagDescriptor.wrap
method can be used to retreive the correspondingTagDescriptor
if necessary.
- registry – The tag registry that the parent
Check that the reference identified by
key
does not already exist and set it.Duplicate reference targets cause an error, unless
duplicates
isTrue
, in which case a warning is logged and the new value is discarded.
Bases:
imprint.core.tags.ReferenceDescriptor
Extension of
ReferenceDescriptor
to accumulate heading text and allow references through thetitle
attribute.Used by <par> tags to create heading references.
A class-level regular expression for identifying the <par> tags that represent referenceable headings.
Create a dual reference based on the title and optional ID in addition to the default logging.
Ensure that
identifiers
is read-only.
Add the section heading to the usual reference text.
Register a
SegmentRefProcessor
for the <segment-ref> tag.This registration hook uses a fixed name, so can only be called once.
Check that the reference identified by
key
does not already exist and set it.Duplicate reference targets cause an error, unless
duplicates
isTrue
, in which case a warning is logged and the new value is discarded.
Start accumulating content in addition to the default logging.
If an actual <par> tag is encountered (as opposed to a tag playing that role), and the heading matches
Heading \d+
, the current heading is incremented in the state.If any heading tag, or any tag with
role="par"
is encountered, a new reference will be created. Non-heading paragraphs with no explicit role are non-referenceable. A non-heading paragraph can be made referenceable by explicitly setting the role.Keep in mind that the title for a segment reference is accumulated from all the text in the paragraph. Use carefully with non-default tags.
Utility Functions¶
Resolve the value of
key
with respect toattr
, but with the option to override by the data configuration dictionary.If the final value is sentinel, return default instead. Return default if key is missing entirely as well. Both attr and data must be mapping types that support a get method.
Convert a string, number or pre-constructed size to a
docx.shared.Length
object, usingget_key
for value resolution.Common options for
key
are'width'
and'height'
.Valid units suffixes are
"
,in
,cm
,mm
,pt
,emu
,twip
. Default when no units are specified is inches ("
).
Retrieve and load the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged and a
KnownError
is raised.
Load and run the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged, as with
get_handler
.All exceptions that occur during execution are converted into
KnownError
.
Compute the required styles based on attr and data configurations.
Style keys are taken from the keys of defaults, while values provide the fallback names used if the keys do not appear in either attr or data. Similarly named keys in data will override ones in
attr
.
Create a dictionary with keys
width
andheight
and values that are instances ofdocx.shared.Length
.Values are resolved according to the rules of
get_key
, withwidth_key
andheight_key
as the inputs. String values may contain units, and will be parsed according toget_size
.If neither key is present in either configuration (or present but set to
None
), set the the width to default_width. If that isNone
as well, return an empty dictionary.
Parser State Objects¶
The imprint.core.state
module supplies the state objects that
enable communication within the Engine Layer
between the engine itself and the tags. The state is therefore crucial
to the XML Tag API without being completely a part of it.
-
class
imprint.core.state.
EngineState
(doc, keywords, references, log)¶ A simple container type used by the main parser to communicate document state to the tag descriptors.
Most of the state is dedicated to monitoring the status of the text acquisition from the XML. The engine and built-in tags rely on a set of attributes to function. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they choose.
This class allows for a containment check using
in
in preferece tohasattr
.-
doc
¶ -
The document that is being built. Set once by the engine.
Implemented as a read-only property.
-
keywords
¶ -
The keywords configured for this document by the IPC File. Normally, this dictionary should be treated as read-only, but
ExprTag
can add new entries.As a rule, keywords with lowercase names are system configuration options, while keywords that start with upper case letters affect document content.
Implemented as a read-only property.
-
references
¶ -
A multi-level mapping type that allows references to be fetched by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key)
. For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str
.The mapping is made immutable as soon as it becomes part of the state. The read-only lock is irreversible.
Implemented as a read-only property.
-
paragraph
¶ -
A paragraph represents a collection of runs and other objects that make up a logical segment in a document. This attribute exists only when parsing a <par> tag. Usually set and unset by
ParTag
, but can be temporarily switched off and reinstated in response to other tags as well.end_paragraph
deletes this attribute.
-
run
¶ -
A run is a collection of characters with similar formatting within a paragraph. This attribute exists only when parsing a <run> tag. Usually set and unset by
RunTag
.end_paragraph
deletes this attribute.
-
content
¶ -
A mutable buffer used by the engine to accumulate text from the XML Template.
Since whitespace needs to be trimmed rather aggressively from an XML file, this object gets an extra (non-standard) attribute:
-
content.
leading_space
¶ Indicates whether or not to prepend a space when concatenating this buffer with others. In general, the text of the first run in a paragraph is the only one that does not have this attribute set to
True
. This flag is set on the buffer rather than the state object itself so that buffers can be pushed and popped into thecontent_stack
to handle nested tags.
This attribute should be manipulated mostly through the
new_content
,get_content
andflush_run
methods.This attribute must always be present, regardless of the position within the document.
Implemented as a read-write property that can not be deleted or set to
None
. -
-
content_stack
¶ collections.deque
[io.StringIO
]A stack for nested content buffers. Each buffer represents a tag containing independent content. Some tags append to the parent’s buffer, some close the current buffer to start a new one and others, such as <figure>, use a temporary buffer for their content.
The stack allows for a theoretically indefinite level of nesting of text elements. In reality, it will only contain one or two elements: the current run text and the contents of interpersed tags like <figure>.
This attribute should be maniplated through the
push_content_stack
andpop_content_stack
methods.This attribute may be empty, but never missing. Implemented as a read-only property.
-
last_list_item
¶ -
List items in Word are just paragraphs with a particular style and numbering scheme. All of this information can be gathered from the previous paragraph that was assigned a concrete list numbering instance.
This attribute should never be missing. It should only be
None
to indicate that no prior numbered paragraph has occured in the document yet. To this end, it is implemented as a read-only property.
-
latex_count
¶ -
A counter for the number of <latex> tags encountered so far. Used to generate the file name for the equations if Image Logging is enabled. Missing otherwise.
-
__contains__
(name)¶ Checks if the specified name represents an attribute.
-
check_content_tail
()¶ Include any remaining text in
content
into the last run of the last paragraph.This ensures that paragraphs get truncated properly, and that spurious text between paragraphs is cleaned up.
A warning is issued if any non-whitepace text is found.
-
end_paragraph
(tag=None)¶ Terminate the current paragraph.
Any existing run is immediately terminated. Spurious text is appended to the last available run. Both
paragraph
andrun
attributes are deleted by this method.If there is no paragraph to terminate, this method is equivalent to calling
check_content_tail
.Parameters: tag (str or None) – The name of a tag that interrupts the paragraph. If present, a warning will be issued. If omitted, no warning will be issued.
-
flush_run
(renew=True, default='')¶ Flush the text buffer accumulating the current run into the document.
Text flushing aggressively removes whitespace from around individual lines. A single space character is prepended before the text if
content.leading_space
isTrue
.If not inside a run, this is a no-op.
Parameters: - renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
True
. The new buffer hasleading_space
set toTrue
. - default (str) – The text to insert if the current
content
buffer is empty. Defaults to nothing (''
).
- renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
-
get_content
(default='')¶ Retrieve the text in the current
content
buffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is empty (or contains only whitespace), return default instead.
If the text is non-empty, and
content
hasleading_space
set toTrue
, prepended a space.
-
image_log_name
(id, ext='')¶ Create an output name to log an image (or data), for a Data Configuration with the given ID, and an optional extension.
This is the standard name-generator for any component ( tag descriptor or plugin handler) that enables image logging in response to log_images.
The base name is the result of concatenating an extension-less log_file (or output_docx if not set), with
id
, separated by an underscore.ext
is appended as-is, if provided.
-
inject_par
(style='Default Paragraph Font', pstyle='Normal', text='')¶ Insert a new paragraph into the document with the specified styles and text, and return it.
The contents of the paragraph will be a single run with the specified text. Any previously existing
paragraph
andrun
will be terminated (seeend_paragraph
) and reinstated with their proir styles once the new content is inserted.Parameters: Returns:
-
insert_picture
(img, flush_existing=True, style='Default Paragraph Font', pstyle='Quote', **kwargs)¶ Insert an image into the current document.
Images must be inserted into a run, so the following cases are recognized:
- Outside <par>
- Create a new temporary
Paragraph
and a newRun
. Neither object is retained (i.e. inparagraph
andrun
). - Inside <par> but outside <run>
- Create a new temporary
Run
, which will not be retained. - Inside <run>
- If the requested
style
matches the style of the currentrun
, it will be flushed and extended. Otherwise, the currentrun
will be interrupted by a temporary run with the new style, and then reinstated.
It is an error to have a run outside a paragraph.
Parameters: - img (str or file-like) – The image can be the name of a file on disk, or an open file
(including in memory files like
io.BytesIO
). In the latter case, the file pointer must be at the beginning of the image data. - style (str) – The name of the Character Style to apply to a new run.
- pstyle (str) – The name of the Paragraph Style to apply if a new paragraph needs to be created.
Two additional keyword-only arguments can be supplied to
add_picture
:width
andheight
.
-
interrupt_paragraph
(warn=None)¶ A context manager for interrupting the current run/paragraph and resuming it when complete.
The current paragraph and run are ended before the body of the
with
block executes. They are reinstated afterwards, if they existed to begin with, with the same styles as before.Parameters: warn (str, bool or None) – If a boolean, determines whether or not to issue a generic warning if a paragraph is actually interrupted. If a string, it is interpreted as the name of the tag that is interrupting the paragraph, and mentioned in the warning. No warning will be issued if falsy. Defaults to None
.
-
log
(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log
. XML location meta-data will be inserted into any log messages.
-
new_content
(leading_space=None)¶ Update the
content
text buffer to a new, emptyStringIO
.Calling this method is faster than doing a seek-truncate according to http://stackoverflow.com/a/4330829/2988730.
Parameters: leading_space (tri-state bool) – If None
, copyleading_space
from the currentcontent
. Otherwise, set to the provided value. The default is to copy the existing value.
-
new_run
(tag, style='Default Paragraph Font', pstyle='Normal', check_in_par=True, keep_par=True)¶ Create a new
run
.This method handles cases when a run is requested outside a paragraph, or inside an existing run:
- Nested runs are forbidden, but run injection is not.
- Existing content is flushed for injected runs.
- Runs outside a paragraph will generate a temporary paragraph
with a default style.
- Missing paragraphs can optionally raise a warning.
- The temporary paragraph can optionally be retained as the current paragraph.
Parameters: - name (str) – The name of the tag requesting the run. If there is already
a
run
attribute present, settingname='run'
will raise an error because of nesting. - style (str) – The name of the style to use for the new run.
- pstyle (str) – The name of the style to use for a new paragraph, if one has
to be created. Moot if there is already a
paragraph
attribute. - check_in_par (bool) – Whether or not to warn if not in a paragraph. Defaults to
True
. - keep_par (bool) – Whether or not to retain a newly created paragraph object in
the
paragraph
attribute. Moot if there is already aparagraph
attribute.
Returns: - par (docx.text.paragraph.Paragraph) – The paragraph that the run was added to. If
keep_par
isTrue
or there was already aparagraph
attribute set, this will be theparagraph
attribute. - run (docx.run.Run) – The newly created run. This will be set to the
run
attribute unless there is no existingparagraph
attribute, andkeep_par
is set toFalse
.
Notes
Setting
keep_par
toFalse
for a <run> tag outside a paragraph will cause a situation whererun
is set butparagraph
is not. This may cause a problem for the engine, but should never arise with the builtin parsers.- Nested runs are forbidden, but run injection is not.
-
number_paragraph
(list_type, level)¶ Turn the current paragraph into a list item, and store it into
last_list_item
.The exact numbering scheme depends on
last_list_item
, which will be updated to refer to the current paragraph when this method completes.The following behaviors occur in response to
list_type
:list_type
Behavior None
Not a list paragraph. Do not set numbering or change last_list_item
.CONTINUED
Same type and numbering as last_list_item
. Setlast_list_item
.NUMBERED
Start a new numbered list. Set last_list_item
.BULLETED
Start a new numbered list. Set last_list_item
.Parameters:
-
pop_content_stack
()¶ Reinstate the previous level of the
content_stack
to the currentcontent
.Calling this method on an empty stack will cause an error. The current
content
is completely discarded.
-
push_content_stack
(flush=False, leading_space=False)¶ Temporarily create a new text buffer for the
content
.If
flush
isTrue
, the old buffer is flushed to the document and cleared before being pushed to thecontent_stack
. Ifflush
isFalse
, the existing buffer is pushed unchanged. If the content is flushed, itsleading_space
attribute is set toTrue
.If the existing buffer is flushed, the buffer that will be reinstated when the new one is popped will have
leading_space
set toTrue
.The new buffer can have its
leading_space
attribute configured by theleading_space
parameter, which defaults toFalse
.
-
temp_run
(style='Default Paragraph Font', pstyle='Normal', keep_same=False)¶ Create a temporary run in the current context.
The run and paragraph styles will be preserved after the context manager exits. If the run is injected outside a paragraph, a temporary paragraph will be created and forgotten.
Within the context manager, both
paragraph
andrun
are guaranteed to be set to be set.run
will have the style named bystyle
, butparagraph
will only have the style named bypstyle
if it is a temporary paragraph.All content is flushed into the temporary run when this manager exits.
Parameters: - style (str) – The style of the new run.
- pstyle (str) – The style of a new paragraph to contain the run. Used only
if
paragraph
is unset. - keep_same (bool) – If
True
, and a run already exists, and has the same style as this one, retain it instead of making a new one. IfFalse
(the default), always create a new run.
-
-
class
imprint.core.state.
ReferenceState
(registry, log, heading_depth=None)¶ A simple container type used by the reference parser to communicate state to the reference descriptors and accumulate the reference map.
Most of the state is dedicated to monitoring referenceable tags and creating references to them. The engine and built-in tags rely on a set of attributes to function properly. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they chose.
This class allows for a containment check using
in
in preferece tohasattr
.-
registry
¶ Mapping
A subtype of
dict
that follows the same rules astag_registry
. Normally a reference to that attribute.Implemented as a read-only property.
-
references
¶ -
A multi-level mapping type that allows references to be fetched and set by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key)
. For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str
.The map is mutable at this stage in the processing. It accumulates all the referenceable tags found in the document. Setting a value for a key any of whose levels do not exist is completely acceptable: the missing levels will be filled in.
Implemented as a read-only property.
-
heading_depth
¶ -
The configured depth after which
heading_counter
stops having an effect when a subheading is entered. If omitted entirely (None
), all available heading levels will be used.Implemented as a writable property.
-
heading_counter
¶ -
A list containing counters for each heading level encountered. The list is popped back one element whenever a higher level heading is encountered.
len(heading_counter)
is the depth of the outline the parser is currently in. E.g., if the parser is parsing text underSection 3.4.5
,heading_counter
contains[3, 4, 5]
. WhenSection 4
is encountered next, the counter will be reset to[4]
. The heading may be referenced later by title or by ID.A
deque
is not used because it does not support slice deletion, which makes jumping back a few heading levels much easier.Implemented as a read-only property.
-
item_counters
¶ -
A mapping of the :term:referenceable roles to the counters of items in the current heading. All the counters are reset to zero when a new heading below
heading_depth
is encountered.Implemented as a read-only property. The keys of the mapping should not be modified, but the values may be.
-
content
¶ -
A mutable buffer used by the engine to accumulate text from the XML Template only when necessary.
This attribute should be manipulated mostly through the
start_content
andend_content
methods. It should only be present for tags that care about accumulating content for a reference, like <par>. When present, all content, regardless of nested tags, will be accumulated.
-
__contains__
(name)¶ Checks if the specified name represents an attribute.
-
end_content
()¶ Terminate the current content buffer, if any, and return the content after aggressive stripping of whitespace.
If there is no
content
buffer to begin with, an empty string is returned.
-
format_heading
(prefix=None, prefix_sep=' ', sep='.', suffix_sep='-', suffix=None)¶ Format
heading_counter
for display.If suffix is set to a Truthy value, only
heading_depth
items are shown. Otherwise, the entire list is shown.
-
get_content
(default='')¶ Retrieve the text in the current
content
buffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is non-existent, empty or contains only whitespace, return default instead.
-
heading_counter
Ensure that
heading_counter
is read-only.
-
heading_depth
Ensure that
heading_depth
is set to a legitimate value.
-
increment_heading
(level)¶ Increment
heading_counter
at the requested level.Any missing levels are set to 1 with a warning. Any further levels are truncated.
item_counters
is reset ifheading_depth
is unset or a greater value than level.
-
item_counters
Ensure that
item_counters
is read-only.
-
log
(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log
. XML location meta-data will be inserted into any log messages.
-
registry
Ensure that
registry
is read-only.
-
reset_counters
()¶ Set all the values of
item_counters
to zero.
-
-
class
imprint.core.state.
ReferenceMap
¶ A multi-level mapping that stores references in the values.
Values are accessed through a three-level key
(role, attribute, key)
: For a given role, the type of key is determined by theattribute
that names the target. Most tags only supportattribute='id'
, but <segment-ref> also supportsattribute='title'
.key
is the actual value of the attribute that is used to identify the reference.Reference values can be any object whose
__str__
method returns the correct replacement text for the reference.-
__contains__
(key)¶ Checks if this mapping has the specified partial key.
Key may be a single string or a
tuple
with a length between 1 and 3. Checks will be made for the appropriate depth.
-
__getitem__
(key)¶ Retreive the value for the specified three-level key.
-
static
__new__
(cls, *args, **kwargs)¶ Ensure that the map is unlocked when it is first created.
This way calling
__init__
is not a trick for unlocking the map.
-
__setitem__
(key, value)¶ If this mapping is not locked, set the attribute for the specified three-level key.
If any of the levels are new, they are created along the way.
-
__str__
(indent=2)¶ Creates a pretty representation of this map, with indented heading levels.
-
lock
()¶ Lock this mapping to prevent unintentional modification.
This is a one-time operation. There is no way to unlock. After locking,
__setitem__
will raise an error.
-
Programs¶
Imprint comes with a set of command-line entry points to facilitate different tasks. This page is the manual for these programs.
imprint¶
The main program of Imprint, serving as the entry point to create documents.
docx2xml¶
A small utility for extracting text content out of existing Word documents.
Placeholders are inserted for every element that appears to be a table or a figure. No attempt is made to preserve the styles of those elements. Paragraph styles are preserved, as are run styles. An attempt is made to merge as many consecutive runs of the same style as possible.
This program can only operate on .docx
files, not on .doc
files.
Command¶
The same command can be run on both Linux and Windows systems. The Windows file
that provides the executable has a .bat
extension and delegates to the
extension-less Python file:
docx2xml input[.docx] [output[.xml]]
Logging¶
The program log is one of the outputs of Imprint. It is generated by the engine and plugins. The log provides traceability into the workings of Imprint, including plugins. As an important part of the user interaction on many levels, a separate document to describe the logging facility is merited.
Configuration¶
Logging is configured through the IPC File. The following keywords are used to configure the logging output:
All keywords are optional. The default is to log WARNING
and worse to
stdout
. If log_stderr is set to
True
, messages with level ERROR
and worse will be sent to
stderr
instead. In general, when both
stdout_level and stderr_level are
True
, stdout
will receive only the messages with
levels greater than or equal to stdout_level, but
strictly less than stderr_level.
If log_file is set to a non-empty string, all messages
will be logged to it regardless of what is written to stdout
and
stderr
.
The logging format can be controlled by log_format, which
is the same type of string that can be passed in to format argument of
logging.basicConfig
or the fmt argument of
logging.Formatter
. The template is a %
-interpolated format
string that refers to the attributes of a logging.LogRecord
by name.
Image Logging¶
If the keyword log_images is truthy, any images that get inserted into the document are also dumped individually to a file. The name of the images is based on the name of the log file (via log_images), or the name of the document if file logging is disabled. The figure, table or string ID is appended after an underscore, and the appropriate extension is added at the end.
Image logging is implemented individually for tags that generate content. It is
currently supported for the following tags: <figure>,
<latex>, <string> and ocassionally for
<table>. The strings create small .txt
files containing
the snippets they generate. Custom tags are expected to respect image logging in
a way that makes sense.
Under normal circumstances, the tag descriptor is responsible for logging images. However, in certain cases, the logging can be done by the content handler. Among the built-in tags, this is true for tables, since the variety of input data makes it pointless to generalize the type of logging required (as it is for Figures and Strings).
Logging From Tags¶
The XML Tag API allows users to process custom tags by implementing a
TagDescriptor
. Tags should use the
engine core’s logging facility, provided by
the log
method of the
EngineState
. The reason for using the provided
log
method instead of the local logger is that it will attach information
about the parser’s position in the XML file to every record.
Logging From Plugins¶
Unlike tag descriptors,
plugin handlers are left to their own devices when it
comes to logging. All of the XML location information will be available from
the surrounding log records provided by the tag, so no real advantage is to be
gained from providing location information. On the other hand, plugins can
access the convenience methods provided by Python’s
logging framework, such as
debug
and exception
.
The standard procedure for the Builtin Plugins is to get a “private” modue-level logger, and use that throughout:
_logger = logging.getLogger(__name__)
Levels¶
In addition to the normal logging levels provided by the Python
logging
framework, Imprint sets up the following additional levels:
TRACE
- Used to report on the normal activity of a tag processor or plugin that may
be irrelevant for any but the most fine-grained debugging. The priority
defaults to 5, which is lower than
logging.DEBUG
but higher thanlogging.NOTSET
. XTRACE
- Similar to
TRACE
, but includes the current exception information by default. The priority defaults to 2, which is below that ofTRACE
.
All levels are registered with the logging
framework as if they were
built-in. The appropriate methods are registered with the currently configured
default logger class.
Internal API¶
The internals of Imprint are implemented in the imprint.core
package.
Some of the internals are exposed to the user through the XML Tag API in
imprint.core.tags
and imprint.core.state
. The remainder is not
normally of interest to the user. However, it may be useful for developers and
authors of more complex plugins to have access to the internals of the engine.
Contents
Parsers¶
imprint.core.parsers
implements the parsers used to process the
XML Template. These parsers make up the heart of the
Engine Layer.
There are currently two parsers: ReferenceProcessor
and
TemplateProcessor
. Both are instances of
haggis.files.xml.SAXLoggable
. The former creates a table of
reference names/titles/locations/numbers that are used by the the latter.
-
class
imprint.core.parsers.
DocxParserBase
¶ Base class that contains common functionality of the XML parsers that make up the Imprint Engine Layer.
This class is only intended to avoid code duplication. It serves no-standalone purpose whatsoever.
The XML structure is encoded in the following attributes:
-
tag_stack
¶ A stack with special methods for entering a tag, exiting a tag, etc, with some structural validation. The current tag is always available via the
current
property. Each tag is pushed as an object containing the tag name, its (edited) attributes, whether or not it expects content and nested tags, and a flag indicating whether or not a warning has been raised for unexpected text if not. If the tag gets a data configuration, that will be referenced as well.
-
-
class
imprint.core.parsers.
ReferenceProcessor
(heading_depth)¶ The SAX parser that is responsible for pre-computing all the relevant references found within the XML template.
Relevant references are any referenceable tags. This processor maintains its own reference counter based on the occurence of <figure>, <table> and other tags within <par> tags with Heading styles.
-
class
imprint.core.parsers.
TemplateProcessor
(keywords, doc, references)¶ A parser to handle the entire document structure with the assumption that a reference mapping has already been made.
It processes all registered tags, generates all the content, replaces all necessary components such as keywords, strings and references.
Much of the processing is handled by the built-in
TagDescriptor
s and theEngineState
. The parser itself performs sanity checking of the XML structure based on the requirements specified in the descriptors. In addition to checking attributes, content and nested tags, it performs a simplistic form of XML validation.The engine state does not get direct access to the data configuration like it does to the keywords. The data configuration is maintained directly by this class:
-
data_config
¶ A
dict
containing all of the data configuration objects (dictionaries) loaded from the appropriate module if keywords contains a'data_config'
key providing the module file name, andNone
otherwise. Only document setups that actually use data configuration need to provide a configuration module.
-
Tag Handling¶
-
class
imprint.core.parsers.
RootTag
¶ Implement the Root tag, regardless of its name.
The root tag is special because any spurious text found within it gets stashed in a special paragraph.
-
class
imprint.core.parsers.
TagStack
¶ A
deque
-based stack that does some basic structural checking of the XML.
-
class
imprint.core.parsers.
TagStackNode
(name, attr, descriptor=None, config=None, open_error=False)¶ A structure for maintaining information about open tags for
TemplateProcessor
.All of the attributes except
warned
are immutable, so while tempting, anamedtuple
can not be used.All attributes are passed to the constructor in the same order that they are listed here. Only the first two are required.
-
name
¶ The name of the tag, not normalized in any way.
-
attr
¶ A plain
dict
containing therequired
andoptional
attributes of the tag. This attribute is mutable and gets passed to both thestart
andend
methods of the tag descriptor. It is not one of the XML library immutable mappings.
-
descriptor
¶ The
TagDescriptor
object for this tag. This must always be an actual instance of the class, not a delegate object to be wrapped. Defaults toNone
.
-
config
¶ The Data Configuration dictionary, if the
descriptor
calls for one,None
otherwise (the default). If the descriptor has adata_config
attribute set but this attribute isNone
, thenopen_error
must be set toTrue
.
-
-
exception
imprint.core.parsers.
OpenTagError
¶ Used as a goto+label marker when processing opening tags.
As per https://stackoverflow.com/a/41768438/2988730 and https://docs.python.org/3/faq/design.html#why-is-there-no-goto
This error is raised to indicate a non-fatal error that prevents the closing tag from being processed.
Utilities¶
imprint.core.utilities
containins general utilities to help
the engine create and process docx files.
The configuration loaders in this module are potentially suitable for inclusion in the haggis library.
-
imprint.core.utilities.
aggressive_strip
(string)¶ Split a string along newlines, strip surrounding whitespace on each line, and recombine with a single space in place of the newlines.
-
imprint.core.utilities.
check_fail_state
(fail)¶ Verify that fail is one of the valid options
{'raise', 'warn', 'ignore'}
.Raise a
ValueError
if it is not.
-
imprint.core.utilities.
trigger_fail_state
(fail, msg, error_class=<class 'ValueError'>, warn_class=<class 'UserWarning'>)¶ React to a failure according to the value of
fail
:'ignore'
: Do nothing'warn'
: Raise a warning with message msg and class warn_class (UserWarning
by default).'raise'
: Raise an error with message msg and class error_class (ValueError
by default).
Any other value of fail triggers a
ValueError
.
-
imprint.core.utilities.
get_handler
(handler_name)¶ Load the named plugin handler.
Handlers are callables that take an object ID and configuration dictionary and generate content for a specific tag like <figure>, <table> or <string>.
If the handler is not found as-is, the
imprint.handlers
package is prefixed to handler_name since that is where all built-in handlers live.
-
imprint.core.utilities.
load_callable
(name, package_prefix=None, magic_module_attribute=<haggis.SentinelType object>, instantiate_class=False)¶ Retrieve an arbitrary callable from a module
The input may be one of six things:
- A module with a magic_module_attribute that contains the callable.
- A callable that implements the correct interface.
- The name of a module containing the magic_module_attribute.
- The name of a callable.
- The name of a module in the package_prefix package.
- The name of a callable in the package_prefix package.
The correct thing is identified as leniently as possible and returned. The returned object is not guaranteed to be the correct thing, just to pass very cursory inspection (e.g., modules must have the magic attribute and any other objects must be callable)
Items 1, 3, 5 are not possible if magic_module_attribute is not specified. Items 5, 6 are not possible if package_prefix is not specified.
This method has one special case. If the object found is a class with a no-arg __init__ method and a __call__ method, an instance rather than the class object is returned. Note that class objects themselves are callable, so if you specify a class without a no-arg __init__ method or without a __call__ method, make sure that __init__ has the signature you require and returns the object that you expect.
Perform a keyword replacement on all valid newstyle format strings in the header and footer XML of a word document.
This operation is currently done by treating the XML as if it was a giant string. The assumption is valid but hacky, since format-like strings delimited by ‘{}’ are unlikely to appear anywhere outside
<w:t>
tags.
Dependencies¶
Python¶
Imprint requires Python version 3.6 or higher.
Core¶
The core program depends only on three libraries in addition to the built-in Python libraries:
- python-docx: A library for creating documents in Office Open XML format.
- lxml: An XML manipulation libarary that is also a dependency of python-docx.
- haggis: A suite of Python utilities developed by the author of Imprint to support common functionality across multiple tools, including Imprint itself. All additional dependencies come indirectly from Haggis.
Content-generation plugins generally tend to have a much wider set of dependencies.
Documentation¶
This documentation is built with sphinx (version >= 1.7.1 required).
The API documentation requires the napoleon extension, which is now bundled with sphinx itself.
The default viewing experience for the documentation is provided by the ReadTheDocs Theme, which is, however, optional. If installed, a version >= 0.4.0 is recommended[1].
Plugins¶
There is almost no restriction on what Imprint plugin code can depend on. In
fact, plugins can use a wide variety of open source tools and libraries for
tasks like graphics rendering and file conversion. Both Python libraries and
external programs can be dependencies for plugins, since the Python
subprocess
module supports running arbitrary executables. The lists
below show a sample[2] of dependencies used by the builtin:
Python Packages¶
- numpy: A fast array library for Python. This supports most of the data processing done in Imprint as numpy arrays are virtually ubiquitous in Python. This is a dependency of scipy and pillow.
- scipy: A scientific computation libary for Python. In addition to enhancements to numpy, it supplies interfaces to scientific file formats such as IDL files.
- matplotlib: A plotting and graphics library for Python. Much of the data visualization is done through this library.
- pandas: A spreadsheet library for easily manipulating tables.
- pillow: A graphics file library for Python. Used to import images and convert image files.
- natsort: A small natural text-sorting algorithm for Python. It provides advanced sorting techniques that are more intuitive than plain lexicorgaphical sorting, e.g., for strings containing both text and numbers.
External Programs¶
- ImageMagick: A suite of image conversion programs suitable for almost any reasonable format. Mostly the convert program is used, e.g., to create LaTeX equations for the <latex> tag.
- Poppler: A library for manipulating PDF files. In particular the pdftoppm program is used to convert PDF files into importable images.
- GhostScript (gs): Converts PostScript documents into importable images. This is particularly useful for dealing with some of the more flexible backends provided by matplotlib, especially when it comes to LaTeX equations.
- LaTeX: Some implementation of LaTeX is necessary to support in-text LaTeX equations. texlive and pdflatex are examples of implementations that have been used successfully in testing on Linux systems. Only documents containing the <latex> XML tag require this.
- dvips: A converter between DVI and PostScript formats is necessary to bridge the formats supported by latex and convert. This is only a dependency for documents that contain <latex> tags. This program is almost always bundled with reasonable LaTeX distributions.
Dependence on external programs generally represents a restriction to portability across platforms. This is often not a major issue because many standard programs are available for Linux and Mac environments, and generally, a particular coniguration of Imprint plugins will be used in a fairly static environment.
Footnotes
[1] | Versions prior to 0.4.0 had issues with the alignment of line numbers to code in the tutorial examples. |
[2] | These lists are not exhaustive, but should cover most of the interesting items encountered in general use. All items required for the Builtin Plugins are covered. |
Restrictions¶
While Imprint is an extremely complex and flexible system, there are in fact certain things it can not do. The following list contains the major omissions, with a brief explanation of the underlying reasons for each one:
1. Updating the TOC: each newly generated document requires the user to right-click on the empty table of contents and manually select “Update Table”. This is necessary because calculating the page number of the headers would require a rendering engine quivalent to MS Word.
2. Header and Footer Parseability: in some cases, the XML of the Headers and Footers must be massaged manually to ensure that there are no spurious run-breaks within a keyword-replacement directive. Word will sometimes chunk up text into runs when it is not strictly necessary, resulting in the need for this manual massaging. The root cause is that the python-docx library does not currently have support for headers and footers.
Development¶
You can contribute to imprint by providing but reports (or just usage experience), or writing code. Issues can be submitted on GitHub at https://github.com/madphysicist/imprint/issues. To contribute code, fork and clone the repository from https://github.com/madphysicist/imprint. You can modify the code as you wish, and submit a pull request through GitHub.
Branch Structure¶
Feature branches should be branched from dev
. Accepted features should be
squashed into a small number of commits. When a sufficient number of commits are
made, they will be added to master, the minor version will increment, and a
release candidate branch will start.
Installation¶
Installing the project is not strictly necessary for development. That being said, some features may be better tested when the project is installed. Developers can install their local copy for testing by running the following in the project root:
python setup.py develop
This will symlink the development project to the site packages of the current python environment. It is recommended that this command be run in a dedicated virtual environment.
Coding¶
Feel free to suggest and/or implement any feature that you feel is useful. The general phiposophy is to keep things modular. General purpose functions should be added to the haggis library rather than to Imprint itself.
The documentation should explain how the project works. Introduction to Imprint provides a high-level explanation of the overall structure. The Reference secion contains all the references to the individual components.
Testing¶
At the moment, there is no test package for Imprint. Instead, the Demos in the documentation provide good coverage of almost all available features. If you add a feature that is not already covered, please add a section to the appropriate Tutorials page, and modify the corresponding demo (or add a new one) as necessary.
If you would like to contribute a test package to Imprint, that would be wonderful.
Future Work¶
This section details some of the prominent features that are currently being proposed or already implemented for Imprint, but are not a part of the main baseline. This is not an exhaustive list, and does not contain any of the minor bug fixes and enhancements that come naturally with any project of this scope.
Further requests and issues should be raised on the GitHub issues page.
Configurable XML Root Tag¶
The name of the XML root tag can be configured through a key in the
*.ipc. If the input_xml_root
keyword is missing,
the default will remain imprint-template
.
Full MathML Support¶
Imprint will have full MathML support out of the box. At the moment, the details
of the interface are being worked out. Currently, a <math>
tag simply
includes all the XML found inside it verbatim into the OOXML document structure.
Caching of Data¶
Rather than ensuring that the same loader is used for all datasets, as the current system does, it is better to create a cache of weak references to named datasets, with clear loading instructions by data name rather than handler name. This will improve the speed of Imprint (and is therefore not of prime importance).
User Defaults File¶
Create a file with user-level defaults. This will be a .imprint
file in the
user directory on Linux Systems. It will be a mix of default
IPC File and options for hard coded default styles, as well as
anything else that the user uses consistently as a fallback.
An environment variable, something like IMPRINTDEFS
will allow the user to
override this option, along with a -D
command-line option to
imprint.
Clickable Anchors¶
<figure-ref>
, <table-ref>
and especially <segment-ref>
tags should
be replaced with a clickable link-field in the output document. This won’t
affect the printed version much, but would be a very nice feature to have.
PowerPoint Presentations¶
Since the python-pptx library supports a similar low-level interface to python-docx, it is possible to eventually extend Imprint to generate PowerPoint presentations. This is not a high priority because the nature of the PowerPoint medium is such that most presentations tend to be very unique. Word documents tend to be more suitable for cookie cutter generation.
PDF Documents¶
While this migration/support may be desirable from a portability standpoint, MS Word is fairly ubiquitous, and PDFs are not as editable. This is also a low priority item.
Default DOCX Stub¶
Given User Defaults File, a default docx stub will be referenced in that file, which will guarantee the existence of all the referenced styles. This allows detailed per-organization or per-project configuration of the styles that get used.
Default Plugin Prefix¶
Add a single default prefix to A) the config file, which would override the
B) User Defaults File value. The default-default should be
something like imprint.handlers
.
Indices and tables¶
Project home page: https://github.com/madphysicist/imprint