Internal API

The internals of Imprint are implemented in the imprint.core package. Some of the internals are exposed to the user through the XML Tag API in imprint.core.tags and imprint.core.state. The remainder is not normally of interest to the user. However, it may be useful for developers and authors of more complex plugins to have access to the internals of the engine.

Parsers

imprint.core.parsers implements the parsers used to process the XML Template. These parsers make up the heart of the Engine Layer.

There are currently two parsers: ReferenceProcessor and TemplateProcessor. Both are instances of haggis.files.xml.SAXLoggable. The former creates a table of reference names/titles/locations/numbers that are used by the the latter.

class imprint.core.parsers.DocxParserBase

Base class that contains common functionality of the XML parsers that make up the Imprint Engine Layer.

This class is only intended to avoid code duplication. It serves no-standalone purpose whatsoever.

The XML structure is encoded in the following attributes:

tag_stack

A stack with special methods for entering a tag, exiting a tag, etc, with some structural validation. The current tag is always available via the current property. Each tag is pushed as an object containing the tag name, its (edited) attributes, whether or not it expects content and nested tags, and a flag indicating whether or not a warning has been raised for unexpected text if not. If the tag gets a data configuration, that will be referenced as well.

class imprint.core.parsers.ReferenceProcessor(heading_depth)

The SAX parser that is responsible for pre-computing all the relevant references found within the XML template.

Relevant references are any referenceable tags. This processor maintains its own reference counter based on the occurence of <figure>, <table> and other tags within <par> tags with Heading styles.

class imprint.core.parsers.TemplateProcessor(keywords, doc, references)

A parser to handle the entire document structure with the assumption that a reference mapping has already been made.

It processes all registered tags, generates all the content, replaces all necessary components such as keywords, strings and references.

Much of the processing is handled by the built-in TagDescriptors and the EngineState. The parser itself performs sanity checking of the XML structure based on the requirements specified in the descriptors. In addition to checking attributes, content and nested tags, it performs a simplistic form of XML validation.

The engine state does not get direct access to the data configuration like it does to the keywords. The data configuration is maintained directly by this class:

data_config

A dict containing all of the data configuration objects (dictionaries) loaded from the appropriate module if keywords contains a 'data_config' key providing the module file name, and None otherwise. Only document setups that actually use data configuration need to provide a configuration module.

Tag Handling

class imprint.core.parsers.RootTag

Implement the Root tag, regardless of its name.

The root tag is special because any spurious text found within it gets stashed in a special paragraph.

class imprint.core.parsers.TagStack

A deque-based stack that does some basic structural checking of the XML.

stack

The actual stack deque, implemented as a read-only property.

current

The current node. This is just the rightmost node in the stack, or None if the stack is empty. Also a read-only property.

class imprint.core.parsers.TagStackNode(name, attr, descriptor=None, config=None, open_error=False)

A structure for maintaining information about open tags for TemplateProcessor.

All of the attributes except warned are immutable, so while tempting, a namedtuple can not be used.

All attributes are passed to the constructor in the same order that they are listed here. Only the first two are required.

name

The name of the tag, not normalized in any way.

attr

A plain dict containing the required and optional attributes of the tag. This attribute is mutable and gets passed to both the start and end methods of the tag descriptor. It is not one of the XML library immutable mappings.

descriptor

The TagDescriptor object for this tag. This must always be an actual instance of the class, not a delegate object to be wrapped. Defaults to None.

config

The Data Configuration dictionary, if the descriptor calls for one, None otherwise (the default). If the descriptor has a data_config attribute set but this attribute is None, then open_error must be set to True.

open_error

Lets the closing tag know that a non-fatal error occurred on opening, so the closing tag processor should be ignored. Defaults to False.

warned

Indicates that a text content warning has already been issued for a tag that has a content flag set to False when nested text is found. Otherwise remains False. This attribute can not be set by the user on initialization.

exception imprint.core.parsers.OpenTagError

Used as a goto+label marker when processing opening tags.

As per https://stackoverflow.com/a/41768438/2988730 and https://docs.python.org/3/faq/design.html#why-is-there-no-goto

This error is raised to indicate a non-fatal error that prevents the closing tag from being processed.

Utilities

imprint.core.utilities containins general utilities to help the engine create and process docx files.

The configuration loaders in this module are potentially suitable for inclusion in the haggis library.

imprint.core.utilities.aggressive_strip(string)

Split a string along newlines, strip surrounding whitespace on each line, and recombine with a single space in place of the newlines.

imprint.core.utilities.check_fail_state(fail)

Verify that fail is one of the valid options {'raise', 'warn', 'ignore'}.

Raise a ValueError if it is not.

imprint.core.utilities.trigger_fail_state(fail, msg, error_class=<class 'ValueError'>, warn_class=<class 'UserWarning'>)

React to a failure according to the value of fail:

  • 'ignore': Do nothing
  • 'warn': Raise a warning with message msg and class warn_class (UserWarning by default).
  • 'raise': Raise an error with message msg and class error_class (ValueError by default).

Any other value of fail triggers a ValueError.

imprint.core.utilities.get_handler(handler_name)

Load the named plugin handler.

Handlers are callables that take an object ID and configuration dictionary and generate content for a specific tag like <figure>, <table> or <string>.

If the handler is not found as-is, the imprint.handlers package is prefixed to handler_name since that is where all built-in handlers live.

imprint.core.utilities.load_callable(name, package_prefix=None, magic_module_attribute=<haggis.SentinelType object>, instantiate_class=False)

Retrieve an arbitrary callable from a module

The input may be one of six things:

  1. A module with a magic_module_attribute that contains the callable.
  2. A callable that implements the correct interface.
  3. The name of a module containing the magic_module_attribute.
  4. The name of a callable.
  5. The name of a module in the package_prefix package.
  6. The name of a callable in the package_prefix package.

The correct thing is identified as leniently as possible and returned. The returned object is not guaranteed to be the correct thing, just to pass very cursory inspection (e.g., modules must have the magic attribute and any other objects must be callable)

Items 1, 3, 5 are not possible if magic_module_attribute is not specified. Items 5, 6 are not possible if package_prefix is not specified.

This method has one special case. If the object found is a class with a no-arg __init__ method and a __call__ method, an instance rather than the class object is returned. Note that class objects themselves are callable, so if you specify a class without a no-arg __init__ method or without a __call__ method, make sure that __init__ has the signature you require and returns the object that you expect.

imprint.core.utilities.substitute_headers_and_footers(doc_file_name, keywords)

Perform a keyword replacement on all valid newstyle format strings in the header and footer XML of a word document.

This operation is currently done by treating the XML as if it was a giant string. The assumption is valid but hacky, since format-like strings delimited by ‘{}’ are unlikely to appear anywhere outside <w:t> tags.