XML Tag API¶
The Imprint engine comes with a complete set
of processors for the tags specified in the XML Template Specification. However, additional
tags may be necessary for highly customized applications, so an API exists for
defining and registering new tags. The API is defined in the
imprint.core.tags module. Example usage can be found in the
Writing Custom Tags tutorial.
Contents
Tag Descriptors¶
The tag API revolves around the TagDescriptor class. The class can
be extended directly, or instantiated through a delegate object that fulfills
the necessary duck-type API. Objects contain a set of attributes and two
callbacks that define how to handle XML tags of a given type. All the elements
are optional and have sensible default values.
Any registered object will be viewed through TagDescriptor.wrap, so
it is not necessary to extend or instantiate TagDescriptor to
create a working tag descriptor.
Errors¶
Tag descriptors may raise any type of error they deem necessary in their
start and end methods. Most
classes of errors will be logged and cause the application to abort. However,
two special classes of errors will not cause a fatal crash:
KnownErroris used to flag known conditions that can be handled gracefully by the tag.OSError. Specifically, theFileNotFoundErrorandPermissionErrorsubclasses are deemed to be “known errors”. If they represent a fatal condition, they should be wrapped in another exception type.
Any plugins with a dynamic Data Configuration will generally receive an alt-text placeholder where the content would normally go instead of completely aborting.
-
exception
imprint.core.KnownError¶ A custom exception class that is used by the engine to indicate that a tag or plugin handler exited for a known reason.
In cases where this exception is logged, the message is printed without a stack trace.
Configuration¶
Tags have two types of configuration available to them. Static configuration for a given XML Template is provided through the tag attributes in the XML file. Dynamic configuration through the IDC File can be enabled to provide per-document fine-tuning.
XML Attributes¶
XML attributes are supplied to the start and
end methods of a TagDescriptor as the
second argument. The inputs are presented to both methods as a vanilla
dict. The dictionary are meant to be treated as read-only, but this
is not a requirement, meaning that technically start
can modify what end sees. The dictionary is filtered
to exclude any attributes that are not listed in the
required and optional
elements of the TagDescriptor.
Data Configuration¶
For some types of content, static configuration is not enough. To allow
per-document configurations, a TagDescriptor must define a
non-None data_config attribute. This
attribute gives the name of the dictionary to extract from the
IDC File.
start and end methods of a
TagDescriptor with the data_config
attribute set will receive an additional input argument containing the
Data Configuration loaded from the IDC File.
The data configuration can override some of the static XML Attributes of a tag. For built-in tags, the XML Template Specification notes which attributes can be overriden. Built-in tags that support dynamic configuration are <figure>, <table> and <string>.
All built-in tags that support dynamic configuration also support a type of plugin, but this is not a requirement for custom tags.
References¶
A TagDescriptor is referenceable if it has a
non-None reference. A reference made to a
tag will be substituted by the appropriate reference text. By default reference
tags have the target tag name with “-ref” appended:
<figure-ref> references <figure>,
<table-ref> references <table>. A notable
exception is <segment-ref>, which references paragraphs
(<par> tags), but only ones that have a heading style.
References are usually identified by a required id attribute. Segments can
also be identified by the title of the segment, which is the aggressively
trimmed collection of all the text in the text in the paragraph. For example,
the title of the following XML snippet would be 'Example Heading':
<par style="Heading 3">
<run style="Default Paragraph Font">
Example
Heading
</run>
</par>
<segment-ref> tags can therefore identify their target with
either a id or title attribute. User-defined tags can implement their
own customized rules for identiying targets.
Roles¶
For the purpose of creating references, any tag may impersonate, or play the role of, any other tag using a special role attribute. This attribute is implicitly optional for every tag. It is interpreted directly by the parsers in the Engine Layer to determine the type of reference that a tag will represent.
For example, a <table> tag (or any other tag for that
matter), which has role="figure" must be referenced by a
<figure-ref> tag, not a <table-ref> tag,
in the XML Template. That table will be a figure for the purposes
of the document in question.
Any arbitrary tag can be referenced the same way with the appropriate role. Usually, such a referenceable tag will be styled appropriately, and will have the headings, captions, etc. appropriate for its role rather than its nominal tag.
A specific case is arbitrary tags that have a <par> role.
Such tags are automatically referenceable by <segment-ref>.
Their entire contents will be treated as the title of the heading, so the
par role must be used carefully.
Registering New Tags¶
Once a TagDescriptor or a delegate object has been constructed,
there are two main ways to get Imprint to use the descriptor for actual tag
processing.
Via Configuration¶
In the normal course of things, Imprint will not automatically import unspecified user-defined modules. To let it know where to find tag extensions, add them by name or by reference to the IPC File to the mapping in the tags keyword. This will automatically import all the necessary modules, and register the custom descriptor under the requested tag name.
Programatically¶
Under the hood, tags are registered with the Imprint core simply by adding them
to tag_registry:
tag_registry[name] = descriptor
The registry is a special mapping that ensures that name is a string not
representing an existing tag. While it is not possible to remove or overwrite
existing tags, the same descriptor can be registered under multiple names.
This method is useful mostly to users wishing to write a custom driver program for the engine. Under normal circumstances, the configuration solution will be more suitable.
Engine State¶
Both callbacks of a TagDescriptor accept an
EngineState object as their first argument,
which supports stateful tag processing. The engine state provides a mutable
container for arbitrary attributes. Each TagDescriptor can add,
remove and modify attributes of the state object to communicate with itself,
the engine, and other tags.
As a rule, objects should prefer to delete state attributes rather than setting
them to None. This meshes well with the fact that
EngineState provides a containment check. For
example, to check if the parser is in the middle of a run of text, descriptors
should check
if 'run' in state: ...
The built-in tags and the engine use a set of attributes and methods to operate
properly. Modifying these predefined attributes in a way other than explicitly
documented will almost inevitably lead to unexpected behavior. Properties are
used instead of simple attributes in a few cases to provide sanity checks for
the supported modifications. Custom tags can add, remove and modify any
additional attributes they choose. The full list of built-in attributes is
available in the EngineState documentation.
The API¶
The imprint.core package contains the Imprint
Engine Layer. The tags and
state modules implement most of the
functionality useful to end-users through the public XML Tag API. The
parsers and utilities
contain the Internal API.
The imprint.core.tags module implments the base
XML Tag API, as well as the all the predefined
Built-in Tag Descriptors and Reference Descriptors.
The following members are used to construct and register new tags:
A limited mapping type that contains all the currently registered tag descriptors.
Registering a new descriptor is as easy as doing:
tag_registry[name] = descriptor
The registry is a restricted mapping type that supports adding new elements only if they are not already registered. Existing elements can not be deleted. Deletion operations will raise a
TypeError, while overwriting existing keys will raise aKeyError. Aside from that, all operations supported bydictare allowed (including things likeupdate).Any tag that is referenceable by design (has a valid
referenceattribute) will have theReferenceDescriptor’s registration hook invoked after the tag-proper is registered.The built-in tags are registered when the current module is imported.
The basis of the tag API.
Instances of this class contain the information required to process a custom tag. They must contain all of the attributes listed below, with the expected types. The elements in
tag_registrymay be delegate objects that supply only part of the attibute set. In that case, they are wrapped in a proxy as needed at runtime, never up-front. The reason for this is twofold:- There may be stateful objects registered for multiple tags, and wrapping in a proxy will not allow the tags to share state. This would not be a problem, except it would be unexpected behavior.
- Some of the attributes may be dynamic properties (or other descriptors). Fixing the value once would completely defeat such behavior.
Creating an occasional wrapper around a delegate is not expected to be particularly expensive, even if it had to be done for every tag encountered in the XML file. On the other hand, it allows for some very flexible behaviors. At the same time, very few instances of wrapping should occur, since most tags will be implemented by extending this class and implementing it properly. The
wrapmethod ensures that all extensions are passed through as-is.All the Built-in Tag Descriptors are instances of children of this class.
A tri-state
boolflag indicating whether the tag is allowed/expected to have textual content or not. The values are interpreted as follows:- None
- The tag may not have any content. It must be of the form
<tag/>or<tag><otherTag>...</otherTag></tag>. Anything else will raise a fatal error. Iftagsis set toFalse, only the former form is allowed. - False
- The tag should not have content, but content will not raise an error. A warning will be raised instead.
- True
- The tag is expected to have content, but the content may be empty.
Any value is allowed in a delegate. If defined, the value will be converted to
boolif it is notNone. Defaults toNoneif not defined.
A
boolindicating whether or not nested tags are allowed within this one.Any value is allowed in a delegate. If defined, the value will be converted to
bool. Defaults toTrueif not defined.
A
tupleof strings containing the name of required tag attributes. A tag encountered without all of these attributes will raise an error.In a delegate, this may be a single string, an iterable of strings,
Noneor simply omitted. Every element of an iterable must be a string, or aTypeErroris raised immediately during construction. Defaults to an emptytupleif not defined.
A dictionary mapping the names of optional attributes to their default values. Optional attributes are ones that are expected to be present in processing, but have sensible defaults that can be used, meaning that they do not have to be specified explicitly in the XML Template.
In a delegate, this may be any mapping type, an iterable of strings, a single string,
Noneor simply omitted. In the case of an iterable or individual string, all the defaults will beNone. Iterables and mapping keys must be strings, or aTypeErrorwill be raised during contruction. Defaults to an emptydictif not defined.
The name of the attribute containing the data configuration name for the tag. This should only be provided for tags that require Data Configuration. If provided, this tag will automatically be added to the
requiredsequence.In a delegate, this object must be an instance of
strorNone. Defaults toNoneif not defined.
A
ReferenceDescriptorthat is only present if this type of tag can be the target of a reference.Examples of referrable built-in tags are <figure>, <table> and sometimes <par>. Referrable tags can have an optional
roleattribute that changes the type of reference they represent. See the Roles description for more information.In a delegate, this object must be an instance of
ReferenceDescriptororNone. Defaults toNoneif not defined.
After completion, this instance has all of the required attributes defined in the delegate, wrapped in the required types.
A reference to the delegate object is not retained. This method can be invoked multiple times. It updates the current descriptor with the attributes of the delegate, leaving undefined attributes in the delegate untouched.
Create an empty instance, with all required attributes set to default values.
This method is provided to allow bypassing the default
__init__in child classes. All arguments are ignored.
Each descriptor should provide a method with this signature to process closing tags.
If implemented, this method must accept the Engine State, a tag name and a
dictof attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag. The attributes are the same as those passed tostart, barring any modifications made instart.Descriptors that have a non-
Nonedata_configattribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Each descriptor should provide a method with this signature to process opening tags.
If implemented, this method must accept the Engine State, a tag name and a
dictof attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag.Descriptors that have a non-
Nonedata_configattribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Construct a proxy from the descriptor if it isn’t already one.
This method is provided so that when
TagDescriptorobjects are implemented properly up front, they do not need to be wrapped in an additional layer.If the input is a delegate, the return value will always be of the type that this method was invoked on. However, the type check will always be done agains the base
TagDescriptorclass.
Bases:
imprint.core.tags.TagDescriptorThe base class of all the built-in
TagDescriptorimplementations.Custom tag implementations are welcome to use this class as a base instead of a raw
TagDescriptor.Updates the required fields with the keywords that are passed in.
If no delegate object (or
None) is supplied, bypass the default constructor (seeTagDescriptor.__new__). kwargs will override any defaults and attributes set by a delegate.
Built-in Tag Descriptors¶
The existing tag descriptors implement the XML Template Specification:
Bases:
imprint.core.tags.BuiltinTagImplements the <break> tag.
Insert a page break into the document.
Bases:
imprint.core.tags.BuiltinTagImplements the <expr> tag.
Warning
This descriptor uses
evalto execute arbitrary code and assign it to a new keyword. Use with extreme caution!Evaluate the expression found inside the tag, and add a new entry to the
state’skeywords.The
content_stackwill be popped.All errors in importing and evaluation will be propagated up and will terminate the parser.
Begin a new expression.
This just pushes a new
content_stackentry in the state. All content until the closing tag will be evaluated as a set of Python statements.
Bases:
imprint.core.tags.BuiltinTagImplements the <figure> tag.
Generate and insert a figure based on the selected handler.
Figures can appear in a run, a paragraph, or on their own.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTagImplements the <kwd> tag.
Find the value of the keyword in the state’s
keywordsand place it into the currentcontent.If the keyword is not found, a
KeyErrorwill be raised. If the tag has aformatattribute, it is interpreted as aformat_spec, and used to convert the value. If the attribute is not present, the value is converted with a simple call tostr.
Bases:
imprint.core.tags.BuiltinTagImplements the <latex> tag.
Convert the equation in the text of the current tag into an image using
haggis.latex_util.render_latex, and insert the image into the parent tag.The parent can be a run or a paragraph. If the requested run style does not match the current run, the current run will be interrupted by a run containing a new picture with the requested style, and resumed afterwards. If there is no run to begin with, a new run will be created, but not stored in the
runattribute of the state.Formulas are rendered at 96dpi in JPEG format by default.
Begin a new LaTeX formula.
Just push a new
content_stackentry into state. All content until the closing tag is evaluated as a LaTeX document.
Bases:
imprint.core.tags.BuiltinTagImplements the <n> tag.
Add a line break to the current run.
If not inside a run, append the break to the last run. Make a new run only at the start of a paragraph. Ignore with a warning outside of a paragraph.
Bases:
imprint.core.tags.BuiltinTagImplements the <par> tag.
Validate the
listattribute that is found.Log an error if the attribute is invalid, but do not terminate processing. The attribute is simply ignored if the list is neither numbered, bulleted nor continued.
Return the type normalized to a
ListType, orNoneif not a list item. If the type is valid, andlist-levelis set, it is converted to an integer.
Compute the paragraph style based on whether an explicit style is set in the attributes, and whether or not the paragraph is a list.
- If an explicit style is requested, return it. Otherwise:
- If the paragraph is not a list, return the default paragraph style. Otherwise:
- If the previous paragraph is a list item in the same list
(i.e., the current
list-levelattribute is non-zero), return the style of the previous paragraph. Otherwise: - Return the default list item style.
Parameters: - state (EngineState) – The state is used to check for the previous item’s style in case #3.
- attr (dict) – The tag attributes, used to check for an explicitly set
styleas well as for a style reset withlist-level = 0. - list_type (ListType or None) – The type of the list, if a list at all, as returned by
check_list.
Terminate the current paragraph.
See
end_paragraphinEngineState.
Terminate any existing paragraph, flush all text and start a new paragraph.
If the new paragraph is a list item, add the necessary metadata to it.
Issue a warning if an existing paragraph is found.
Bases:
imprint.core.tags.BuiltinTagImplements the <run> tag.
Place any remaining text into the current run, and remove
runattribute ofstate.
Create a new run, ensuring that there is a paragraph to go with it.
Creating a run outside a paragraph raises a warning and creates a paragraph with a default style. See
imprint.core.state.EngineState.new_run.
Bases:
imprint.core.tags.BuiltinTagImplements the <section> tag.
Begin a new section in the document, optionally altering the page orientation.
Bases:
imprint.core.tags.BuiltinTagImplements the <skip> tag.
Bases:
imprint.core.tags.BuiltinTagImplements the <string> tag.
Generate a string based on the appropriate handler.
If the
log_imageskey is set to a truthy value instate.keywords, the content will also be dumped to a file.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTagImplements the <table> tag.
Generate and inserts a table based on the selected handler.
The handler creates the table directly in the document (unlike for figures, where only the final product is inserted). Any error that occurs mid-processing leaves a stub table in the document in addition to the automatically-inserted alt-text.
Tables appear on their own, outside any paragraph or run, so if a table is nested in a run or paragraph, a warning will be issued. Any interrupted run or paragraph resumes after the table with their prior styles.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTagImplements the <toc> tag.
Terminate and insert the TOC.
Gather any text that has been acquired into the heading, which will be a separate pargraph preceding the TOC.
If the TOC interrupted an existing paragraph, a new paragraph will be resumed with the same style as the original. If a run style is present as well, a run will be recreated too.
Create a new TOC.
Log a warning if the tag appears within a paragraph. Truncate the paragraph, and resum with the prior style. The same happens to the current run, if there is one.
Bases:
imprint.core.tags.BuiltinTagImplements the <figure-ref> and <table-ref> tags.
This processor is not registered explicitly. It gets added by all of the target tags that use it as part of their registration process. Registering this processor under a name that does not end in
'-ref'will lead to a runtime error inresolve.Insert a string with the specified reference into the current
content.
Returns a quasi-singleton instance of the current class.
This instance is not exposed directly, but it is registered by the built-in referencable tags.
Overridable operation for fetching and logging the reference that is to be inserted.
The default is to look up the reference by
'id'in theimprint.core.state.EngineState’s.references.Used by the default implementation of
end.
Bases:
imprint.core.tags.ReferenceProcessorImplements the <segment-ref> tag.
This is a special case of
ReferenceProcessorthat allows access by bothtitleandid. It’s references always resolve to a <par> tag, or a tag playing that role.Resolve a segment reference be either text or ID.
Either the
idortitletag attribute must be present. If both are present, they must resolve to the same heading in the document or an error is raised.
Reference Descriptors¶
Defines the process for creating References and using them through the appropriate tag.
References are made by processing the XML Template and mapping out any referenceable tags using the
startandendmethods. In the default implementation, the reference text is created by themake_referencemethod, invoked fromend.startandendreturn a boolean value to allow custom tags to be processed selectively. A return value ofFalsefrom either method means that that the specific instance of the tag being processed is not a valid reference target. Normally both methods always returnTrue, but for the builtin <par> tag, for example, an exception must be made.References are placed into the document by a special
TagDescriptor, which is generally registered along with the parent tag that contains aReferenceDescriptorusing theregistermethod.Current references are purely textual, rather having a dynamic field assigned to them. This is still a work in progress.
The prefix that normally gets prepended to the reference text. Used by
make_referenceto construct the output string. Extensions are welcome to ignore this attribute.
A string or iterable of strings that lists the attributes that are used to identify target for this reference type. The attribute may be either required or optional for the target tag, but it must be recognized either way. This attribute is used to check for attributes on tags with a non-default role. Defaults to
'id'.
Process the closing tag for a referencable tag.
The default is to add the reference to the appropriate map in
referencesby ID, based on therole, and log the operation. The attributeidis required.The actual reference is created by
make_reference.Returns
Trueif the tag is definitely a reference target,Falseif not.
-
identifiers Ensure that
identifiersis read-only.
Returns a string refering to the specified tag in the specified role.
Keep in mind that the
ReferenceDescriptoris selected based on the role, not necessarily the tag name. Therefore, theroleargument should always be the “computed” role: the name of the tag should be overriden by the value of the attribute, if it was specified.
A registration hook that is invoked when the parent
TagDescriptoris registered.The default implementation registers an additional
TagDescriptorunder the namename + '-ref', which replaces the<name-ref/>tag with the formatted reference. SeeReferenceProcessor.Parameters: - registry – The tag registry that the parent
TagDescriptoris being inserted into. Seetag_registryfor details on the interface. - name (str) – The name under which the parent tag is being registered.
- descriptor – The parent object being registered, not necessarily a
TagDescriptor. TheTagDescriptor.wrapmethod can be used to retreive the correspondingTagDescriptorif necessary.
- registry – The tag registry that the parent
Check that the reference identified by
keydoes not already exist and set it.Duplicate reference targets cause an error, unless
duplicatesisTrue, in which case a warning is logged and the new value is discarded.
Bases:
imprint.core.tags.ReferenceDescriptorExtension of
ReferenceDescriptorto accumulate heading text and allow references through thetitleattribute.Used by <par> tags to create heading references.
A class-level regular expression for identifying the <par> tags that represent referenceable headings.
Create a dual reference based on the title and optional ID in addition to the default logging.
Ensure that
identifiersis read-only.
Add the section heading to the usual reference text.
Register a
SegmentRefProcessorfor the <segment-ref> tag.This registration hook uses a fixed name, so can only be called once.
Check that the reference identified by
keydoes not already exist and set it.Duplicate reference targets cause an error, unless
duplicatesisTrue, in which case a warning is logged and the new value is discarded.
Start accumulating content in addition to the default logging.
If an actual <par> tag is encountered (as opposed to a tag playing that role), and the heading matches
Heading \d+, the current heading is incremented in the state.If any heading tag, or any tag with
role="par"is encountered, a new reference will be created. Non-heading paragraphs with no explicit role are non-referenceable. A non-heading paragraph can be made referenceable by explicitly setting the role.Keep in mind that the title for a segment reference is accumulated from all the text in the paragraph. Use carefully with non-default tags.
Utility Functions¶
Resolve the value of
keywith respect toattr, but with the option to override by the data configuration dictionary.If the final value is sentinel, return default instead. Return default if key is missing entirely as well. Both attr and data must be mapping types that support a get method.
Convert a string, number or pre-constructed size to a
docx.shared.Lengthobject, usingget_keyfor value resolution.Common options for
keyare'width'and'height'.Valid units suffixes are
",in,cm,mm,pt,emu,twip. Default when no units are specified is inches (").
Retrieve and load the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged and a
KnownErroris raised.
Load and run the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged, as with
get_handler.All exceptions that occur during execution are converted into
KnownError.
Compute the required styles based on attr and data configurations.
Style keys are taken from the keys of defaults, while values provide the fallback names used if the keys do not appear in either attr or data. Similarly named keys in data will override ones in
attr.
Create a dictionary with keys
widthandheightand values that are instances ofdocx.shared.Length.Values are resolved according to the rules of
get_key, withwidth_keyandheight_keyas the inputs. String values may contain units, and will be parsed according toget_size.If neither key is present in either configuration (or present but set to
None), set the the width to default_width. If that isNoneas well, return an empty dictionary.
Parser State Objects¶
The imprint.core.state module supplies the state objects that
enable communication within the Engine Layer
between the engine itself and the tags. The state is therefore crucial
to the XML Tag API without being completely a part of it.
-
class
imprint.core.state.EngineState(doc, keywords, references, log)¶ A simple container type used by the main parser to communicate document state to the tag descriptors.
Most of the state is dedicated to monitoring the status of the text acquisition from the XML. The engine and built-in tags rely on a set of attributes to function. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they choose.
This class allows for a containment check using
inin preferece tohasattr.-
doc¶ -
The document that is being built. Set once by the engine.
Implemented as a read-only property.
-
keywords¶ -
The keywords configured for this document by the IPC File. Normally, this dictionary should be treated as read-only, but
ExprTagcan add new entries.As a rule, keywords with lowercase names are system configuration options, while keywords that start with upper case letters affect document content.
Implemented as a read-only property.
-
references¶ -
A multi-level mapping type that allows references to be fetched by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key). For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str.The mapping is made immutable as soon as it becomes part of the state. The read-only lock is irreversible.
Implemented as a read-only property.
-
paragraph¶ -
A paragraph represents a collection of runs and other objects that make up a logical segment in a document. This attribute exists only when parsing a <par> tag. Usually set and unset by
ParTag, but can be temporarily switched off and reinstated in response to other tags as well.end_paragraphdeletes this attribute.
-
run¶ -
A run is a collection of characters with similar formatting within a paragraph. This attribute exists only when parsing a <run> tag. Usually set and unset by
RunTag.end_paragraphdeletes this attribute.
-
content¶ -
A mutable buffer used by the engine to accumulate text from the XML Template.
Since whitespace needs to be trimmed rather aggressively from an XML file, this object gets an extra (non-standard) attribute:
-
content.leading_space¶ Indicates whether or not to prepend a space when concatenating this buffer with others. In general, the text of the first run in a paragraph is the only one that does not have this attribute set to
True. This flag is set on the buffer rather than the state object itself so that buffers can be pushed and popped into thecontent_stackto handle nested tags.
This attribute should be manipulated mostly through the
new_content,get_contentandflush_runmethods.This attribute must always be present, regardless of the position within the document.
Implemented as a read-write property that can not be deleted or set to
None. -
-
content_stack¶ collections.deque[io.StringIO]A stack for nested content buffers. Each buffer represents a tag containing independent content. Some tags append to the parent’s buffer, some close the current buffer to start a new one and others, such as <figure>, use a temporary buffer for their content.
The stack allows for a theoretically indefinite level of nesting of text elements. In reality, it will only contain one or two elements: the current run text and the contents of interpersed tags like <figure>.
This attribute should be maniplated through the
push_content_stackandpop_content_stackmethods.This attribute may be empty, but never missing. Implemented as a read-only property.
-
last_list_item¶ -
List items in Word are just paragraphs with a particular style and numbering scheme. All of this information can be gathered from the previous paragraph that was assigned a concrete list numbering instance.
This attribute should never be missing. It should only be
Noneto indicate that no prior numbered paragraph has occured in the document yet. To this end, it is implemented as a read-only property.
-
latex_count¶ -
A counter for the number of <latex> tags encountered so far. Used to generate the file name for the equations if Image Logging is enabled. Missing otherwise.
-
__contains__(name)¶ Checks if the specified name represents an attribute.
-
check_content_tail()¶ Include any remaining text in
contentinto the last run of the last paragraph.This ensures that paragraphs get truncated properly, and that spurious text between paragraphs is cleaned up.
A warning is issued if any non-whitepace text is found.
-
end_paragraph(tag=None)¶ Terminate the current paragraph.
Any existing run is immediately terminated. Spurious text is appended to the last available run. Both
paragraphandrunattributes are deleted by this method.If there is no paragraph to terminate, this method is equivalent to calling
check_content_tail.Parameters: tag (str or None) – The name of a tag that interrupts the paragraph. If present, a warning will be issued. If omitted, no warning will be issued.
-
flush_run(renew=True, default='')¶ Flush the text buffer accumulating the current run into the document.
Text flushing aggressively removes whitespace from around individual lines. A single space character is prepended before the text if
content.leading_spaceisTrue.If not inside a run, this is a no-op.
Parameters: - renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
True. The new buffer hasleading_spaceset toTrue. - default (str) – The text to insert if the current
contentbuffer is empty. Defaults to nothing ('').
- renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
-
get_content(default='')¶ Retrieve the text in the current
contentbuffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is empty (or contains only whitespace), return default instead.
If the text is non-empty, and
contenthasleading_spaceset toTrue, prepended a space.
-
image_log_name(id, ext='')¶ Create an output name to log an image (or data), for a Data Configuration with the given ID, and an optional extension.
This is the standard name-generator for any component ( tag descriptor or plugin handler) that enables image logging in response to log_images.
The base name is the result of concatenating an extension-less log_file (or output_docx if not set), with
id, separated by an underscore.extis appended as-is, if provided.
-
inject_par(style='Default Paragraph Font', pstyle='Normal', text='')¶ Insert a new paragraph into the document with the specified styles and text, and return it.
The contents of the paragraph will be a single run with the specified text. Any previously existing
paragraphandrunwill be terminated (seeend_paragraph) and reinstated with their proir styles once the new content is inserted.Parameters: Returns:
-
insert_picture(img, flush_existing=True, style='Default Paragraph Font', pstyle='Quote', **kwargs)¶ Insert an image into the current document.
Images must be inserted into a run, so the following cases are recognized:
- Outside <par>
- Create a new temporary
Paragraphand a newRun. Neither object is retained (i.e. inparagraphandrun). - Inside <par> but outside <run>
- Create a new temporary
Run, which will not be retained. - Inside <run>
- If the requested
stylematches the style of the currentrun, it will be flushed and extended. Otherwise, the currentrunwill be interrupted by a temporary run with the new style, and then reinstated.
It is an error to have a run outside a paragraph.
Parameters: - img (str or file-like) – The image can be the name of a file on disk, or an open file
(including in memory files like
io.BytesIO). In the latter case, the file pointer must be at the beginning of the image data. - style (str) – The name of the Character Style to apply to a new run.
- pstyle (str) – The name of the Paragraph Style to apply if a new paragraph needs to be created.
Two additional keyword-only arguments can be supplied to
add_picture:widthandheight.
-
interrupt_paragraph(warn=None)¶ A context manager for interrupting the current run/paragraph and resuming it when complete.
The current paragraph and run are ended before the body of the
withblock executes. They are reinstated afterwards, if they existed to begin with, with the same styles as before.Parameters: warn (str, bool or None) – If a boolean, determines whether or not to issue a generic warning if a paragraph is actually interrupted. If a string, it is interpreted as the name of the tag that is interrupting the paragraph, and mentioned in the warning. No warning will be issued if falsy. Defaults to None.
-
log(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log. XML location meta-data will be inserted into any log messages.
-
new_content(leading_space=None)¶ Update the
contenttext buffer to a new, emptyStringIO.Calling this method is faster than doing a seek-truncate according to http://stackoverflow.com/a/4330829/2988730.
Parameters: leading_space (tri-state bool) – If None, copyleading_spacefrom the currentcontent. Otherwise, set to the provided value. The default is to copy the existing value.
-
new_run(tag, style='Default Paragraph Font', pstyle='Normal', check_in_par=True, keep_par=True)¶ Create a new
run.This method handles cases when a run is requested outside a paragraph, or inside an existing run:
- Nested runs are forbidden, but run injection is not.
- Existing content is flushed for injected runs.
- Runs outside a paragraph will generate a temporary paragraph
with a default style.
- Missing paragraphs can optionally raise a warning.
- The temporary paragraph can optionally be retained as the current paragraph.
Parameters: - name (str) – The name of the tag requesting the run. If there is already
a
runattribute present, settingname='run'will raise an error because of nesting. - style (str) – The name of the style to use for the new run.
- pstyle (str) – The name of the style to use for a new paragraph, if one has
to be created. Moot if there is already a
paragraphattribute. - check_in_par (bool) – Whether or not to warn if not in a paragraph. Defaults to
True. - keep_par (bool) – Whether or not to retain a newly created paragraph object in
the
paragraphattribute. Moot if there is already aparagraphattribute.
Returns: - par (docx.text.paragraph.Paragraph) – The paragraph that the run was added to. If
keep_parisTrueor there was already aparagraphattribute set, this will be theparagraphattribute. - run (docx.run.Run) – The newly created run. This will be set to the
runattribute unless there is no existingparagraphattribute, andkeep_paris set toFalse.
Notes
Setting
keep_partoFalsefor a <run> tag outside a paragraph will cause a situation whererunis set butparagraphis not. This may cause a problem for the engine, but should never arise with the builtin parsers.- Nested runs are forbidden, but run injection is not.
-
number_paragraph(list_type, level)¶ Turn the current paragraph into a list item, and store it into
last_list_item.The exact numbering scheme depends on
last_list_item, which will be updated to refer to the current paragraph when this method completes.The following behaviors occur in response to
list_type:list_typeBehavior NoneNot a list paragraph. Do not set numbering or change last_list_item.CONTINUEDSame type and numbering as last_list_item. Setlast_list_item.NUMBEREDStart a new numbered list. Set last_list_item.BULLETEDStart a new numbered list. Set last_list_item.Parameters:
-
pop_content_stack()¶ Reinstate the previous level of the
content_stackto the currentcontent.Calling this method on an empty stack will cause an error. The current
contentis completely discarded.
-
push_content_stack(flush=False, leading_space=False)¶ Temporarily create a new text buffer for the
content.If
flushisTrue, the old buffer is flushed to the document and cleared before being pushed to thecontent_stack. IfflushisFalse, the existing buffer is pushed unchanged. If the content is flushed, itsleading_spaceattribute is set toTrue.If the existing buffer is flushed, the buffer that will be reinstated when the new one is popped will have
leading_spaceset toTrue.The new buffer can have its
leading_spaceattribute configured by theleading_spaceparameter, which defaults toFalse.
-
temp_run(style='Default Paragraph Font', pstyle='Normal', keep_same=False)¶ Create a temporary run in the current context.
The run and paragraph styles will be preserved after the context manager exits. If the run is injected outside a paragraph, a temporary paragraph will be created and forgotten.
Within the context manager, both
paragraphandrunare guaranteed to be set to be set.runwill have the style named bystyle, butparagraphwill only have the style named bypstyleif it is a temporary paragraph.All content is flushed into the temporary run when this manager exits.
Parameters: - style (str) – The style of the new run.
- pstyle (str) – The style of a new paragraph to contain the run. Used only
if
paragraphis unset. - keep_same (bool) – If
True, and a run already exists, and has the same style as this one, retain it instead of making a new one. IfFalse(the default), always create a new run.
-
-
class
imprint.core.state.ReferenceState(registry, log, heading_depth=None)¶ A simple container type used by the reference parser to communicate state to the reference descriptors and accumulate the reference map.
Most of the state is dedicated to monitoring referenceable tags and creating references to them. The engine and built-in tags rely on a set of attributes to function properly. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they chose.
This class allows for a containment check using
inin preferece tohasattr.-
registry¶ MappingA subtype of
dictthat follows the same rules astag_registry. Normally a reference to that attribute.Implemented as a read-only property.
-
references¶ -
A multi-level mapping type that allows references to be fetched and set by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key). For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str.The map is mutable at this stage in the processing. It accumulates all the referenceable tags found in the document. Setting a value for a key any of whose levels do not exist is completely acceptable: the missing levels will be filled in.
Implemented as a read-only property.
-
heading_depth¶ -
The configured depth after which
heading_counterstops having an effect when a subheading is entered. If omitted entirely (None), all available heading levels will be used.Implemented as a writable property.
-
heading_counter¶ -
A list containing counters for each heading level encountered. The list is popped back one element whenever a higher level heading is encountered.
len(heading_counter)is the depth of the outline the parser is currently in. E.g., if the parser is parsing text underSection 3.4.5,heading_countercontains[3, 4, 5]. WhenSection 4is encountered next, the counter will be reset to[4]. The heading may be referenced later by title or by ID.A
dequeis not used because it does not support slice deletion, which makes jumping back a few heading levels much easier.Implemented as a read-only property.
-
item_counters¶ -
A mapping of the :term:referenceable roles to the counters of items in the current heading. All the counters are reset to zero when a new heading below
heading_depthis encountered.Implemented as a read-only property. The keys of the mapping should not be modified, but the values may be.
-
content¶ -
A mutable buffer used by the engine to accumulate text from the XML Template only when necessary.
This attribute should be manipulated mostly through the
start_contentandend_contentmethods. It should only be present for tags that care about accumulating content for a reference, like <par>. When present, all content, regardless of nested tags, will be accumulated.
-
__contains__(name)¶ Checks if the specified name represents an attribute.
-
end_content()¶ Terminate the current content buffer, if any, and return the content after aggressive stripping of whitespace.
If there is no
contentbuffer to begin with, an empty string is returned.
-
format_heading(prefix=None, prefix_sep=' ', sep='.', suffix_sep='-', suffix=None)¶ Format
heading_counterfor display.If suffix is set to a Truthy value, only
heading_depthitems are shown. Otherwise, the entire list is shown.
-
get_content(default='')¶ Retrieve the text in the current
contentbuffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is non-existent, empty or contains only whitespace, return default instead.
-
heading_counter Ensure that
heading_counteris read-only.
-
heading_depth Ensure that
heading_depthis set to a legitimate value.
-
increment_heading(level)¶ Increment
heading_counterat the requested level.Any missing levels are set to 1 with a warning. Any further levels are truncated.
item_countersis reset ifheading_depthis unset or a greater value than level.
-
item_counters Ensure that
item_countersis read-only.
-
log(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log. XML location meta-data will be inserted into any log messages.
-
registry Ensure that
registryis read-only.
-
reset_counters()¶ Set all the values of
item_countersto zero.
-
-
class
imprint.core.state.ReferenceMap¶ A multi-level mapping that stores references in the values.
Values are accessed through a three-level key
(role, attribute, key): For a given role, the type of key is determined by theattributethat names the target. Most tags only supportattribute='id', but <segment-ref> also supportsattribute='title'.keyis the actual value of the attribute that is used to identify the reference.Reference values can be any object whose
__str__method returns the correct replacement text for the reference.-
__contains__(key)¶ Checks if this mapping has the specified partial key.
Key may be a single string or a
tuplewith a length between 1 and 3. Checks will be made for the appropriate depth.
-
__getitem__(key)¶ Retreive the value for the specified three-level key.
-
static
__new__(cls, *args, **kwargs)¶ Ensure that the map is unlocked when it is first created.
This way calling
__init__is not a trick for unlocking the map.
-
__setitem__(key, value)¶ If this mapping is not locked, set the attribute for the specified three-level key.
If any of the levels are new, they are created along the way.
-
__str__(indent=2)¶ Creates a pretty representation of this map, with indented heading levels.
-
lock()¶ Lock this mapping to prevent unintentional modification.
This is a one-time operation. There is no way to unlock. After locking,
__setitem__will raise an error.
-