Python 3.6.5 Documentation >  "email.headerregistry": Custom Header Objects

"email.headerregistry": Custom Header Objects
*********************************************

**Source code:** Lib/email/headerregistry.py

======================================================================

New in version 3.6: [1]

Headers are represented by customized subclasses of "str". The
particular class used to represent a given header is determined by the
"header_factory" of the "policy" in effect when the headers are
created. This section documents the particular "header_factory"
implemented by the email package for handling **RFC 5322** compliant
email messages, which not only provides customized header objects for
various header types, but also provides an extension mechanism for
applications to add their own custom header types.

When using any of the policy objects derived from "EmailPolicy", all
headers are produced by "HeaderRegistry" and have "BaseHeader" as
their last base class. Each header class has an additional base class
that is determined by the type of the header. For example, many
headers have the class "UnstructuredHeader" as their other base class.
The specialized second class for a header is determined by the name of
the header, using a lookup table stored in the "HeaderRegistry". All
of this is managed transparently for the typical application program,
but interfaces are provided for modifying the default behavior for use
by more complex applications.

The sections below first document the header base classes and their
attributes, followed by the API for modifying the behavior of
"HeaderRegistry", and finally the support classes used to represent
the data parsed from structured headers.

class email.headerregistry.BaseHeader(name, value)

*name* and *value* are passed to "BaseHeader" from the
"header_factory" call. The string value of any header object is
the *value* fully decoded to unicode.

This base class defines the following read-only properties:

name

The name of the header (the portion of the field before the
‘:’). This is exactly the value passed in the "header_factory"
call for *name*; that is, case is preserved.

defects

A tuple of "HeaderDefect" instances reporting any RFC compliance
problems found during parsing. The email package tries to be
complete about detecting compliance issues. See the "errors"
module for a discussion of the types of defects that may be
reported.

max_count

The maximum number of headers of this type that can have the
same "name". A value of "None" means unlimited. The
"BaseHeader" value for this attribute is "None"; it is expected
that specialized header classes will override this value as
needed.

"BaseHeader" also provides the following method, which is called by
the email library code and should not in general be called by
application programs:

fold(*, policy)

Return a string containing "linesep" characters as required to
correctly fold the header according to *policy*. A "cte_type"
of "8bit" will be treated as if it were "7bit", since headers
may not contain arbitrary binary data. If "utf8" is "False",
non-ASCII data will be **RFC 2047** encoded.

"BaseHeader" by itself cannot be used to create a header object.
It defines a protocol that each specialized header cooperates with
in order to produce the header object. Specifically, "BaseHeader"
requires that the specialized class provide a "classmethod()" named
"parse". This method is called as follows:

parse(string, kwds)

"kwds" is a dictionary containing one pre-initialized key,
"defects". "defects" is an empty list. The parse method should
append any detected defects to this list. On return, the "kwds"
dictionary *must* contain values for at least the keys "decoded"
and "defects". "decoded" should be the string value for the header
(that is, the header value fully decoded to unicode). The parse
method should assume that *string* may contain content-transfer-
encoded parts, but should correctly handle all valid unicode
characters as well so that it can parse un-encoded header values.

"BaseHeader"’s "__new__" then creates the header instance, and
calls its "init" method. The specialized class only needs to
provide an "init" method if it wishes to set additional attributes
beyond those provided by "BaseHeader" itself. Such an "init"
method should look like this:

def init(self, *args, **kw):
self._myattr = kw.pop('myattr')
super().init(*args, **kw)

That is, anything extra that the specialized class puts in to the
"kwds" dictionary should be removed and handled, and the remaining
contents of "kw" (and "args") passed to the "BaseHeader" "init"
method.

class email.headerregistry.UnstructuredHeader

An “unstructured” header is the default type of header in **RFC
5322**. Any header that does not have a specified syntax is treated
as unstructured. The classic example of an unstructured header is
the *Subject* header.

In **RFC 5322**, an unstructured header is a run of arbitrary text
in the ASCII character set. **RFC 2047**, however, has an **RFC
5322** compatible mechanism for encoding non-ASCII text as ASCII
characters within a header value. When a *value* containing
encoded words is passed to the constructor, the
"UnstructuredHeader" parser converts such encoded words into
unicode, following the **RFC 2047** rules for unstructured text.
The parser uses heuristics to attempt to decode certain non-
compliant encoded words. Defects are registered in such cases, as
well as defects for issues such as invalid characters within the
encoded words or the non-encoded text.

This header type provides no additional attributes.

class email.headerregistry.DateHeader

**RFC 5322** specifies a very specific format for dates within
email headers. The "DateHeader" parser recognizes that date format,
as well as recognizing a number of variant forms that are sometimes
found “in the wild”.

This header type provides the following additional attributes:

datetime

If the header value can be recognized as a valid date of one
form or another, this attribute will contain a "datetime"
instance representing that date. If the timezone of the input
date is specified as "-0000" (indicating it is in UTC but
contains no information about the source timezone), then
"datetime" will be a naive "datetime". If a specific timezone
offset is found (including *+0000*), then "datetime" will
contain an aware "datetime" that uses "datetime.timezone" to
record the timezone offset.

The "decoded" value of the header is determined by formatting the
"datetime" according to the **RFC 5322** rules; that is, it is set
to:

email.utils.format_datetime(self.datetime)

When creating a "DateHeader", *value* may be "datetime" instance.
This means, for example, that the following code is valid and does
what one would expect:

msg['Date'] = datetime(2011, 7, 15, 21)

Because this is a naive "datetime" it will be interpreted as a UTC
timestamp, and the resulting value will have a timezone of "-0000".
Much more useful is to use the "localtime()" function from the
"utils" module:

msg['Date'] = utils.localtime()

This example sets the date header to the current time and date
using the current timezone offset.

class email.headerregistry.AddressHeader

Address headers are one of the most complex structured header
types. The "AddressHeader" class provides a generic interface to
any address header.

This header type provides the following additional attributes:

groups

A tuple of "Group" objects encoding the addresses and groups
found in the header value. Addresses that are not part of a
group are represented in this list as single-address "Groups"
whose "display_name" is "None".

addresses

A tuple of "Address" objects encoding all of the individual
addresses from the header value. If the header value contains
any groups, the individual addresses from the group are included
in the list at the point where the group occurs in the value
(that is, the list of addresses is “flattened” into a one
dimensional list).

The "decoded" value of the header will have all encoded words
decoded to unicode. "idna" encoded domain names are also decoded
to unicode. The "decoded" value is set by "join"ing the "str"
value of the elements of the "groups" attribute with "', '".

A list of "Address" and "Group" objects in any combination may be
used to set the value of an address header. "Group" objects whose
"display_name" is "None" will be interpreted as single addresses,
which allows an address list to be copied with groups intact by
using the list obtained from the "groups" attribute of the source
header.

class email.headerregistry.SingleAddressHeader

A subclass of "AddressHeader" that adds one additional attribute:

address

The single address encoded by the header value. If the header
value actually contains more than one address (which would be a
violation of the RFC under the default "policy"), accessing this
attribute will result in a "ValueError".

Many of the above classes also have a "Unique" variant (for example,
"UniqueUnstructuredHeader"). The only difference is that in the
"Unique" variant, "max_count" is set to 1.

class email.headerregistry.MIMEVersionHeader

There is really only one valid value for the *MIME-Version* header,
and that is "1.0". For future proofing, this header class supports
other valid version numbers. If a version number has a valid value
per **RFC 2045**, then the header object will have non-"None"
values for the following attributes:

version

The version number as a string, with any whitespace and/or
comments removed.

major

The major version number as an integer

minor

The minor version number as an integer

class email.headerregistry.ParameterizedMIMEHeader

MIME headers all start with the prefix ‘Content-‘. Each specific
header has a certain value, described under the class for that
header. Some can also take a list of supplemental parameters,
which have a common format. This class serves as a base for all the
MIME headers that take parameters.

params

A dictionary mapping parameter names to parameter values.

class email.headerregistry.ContentTypeHeader

A "ParameterizedMIMEHeader" class that handles the *Content-Type*
header.

content_type

The content type string, in the form "maintype/subtype".

maintype

subtype

class email.headerregistry.ContentDispositionHeader

A "ParameterizedMIMEHeader" class that handles the *Content-
Disposition* header.

content-disposition

"inline" and "attachment" are the only valid values in common
use.

class email.headerregistry.ContentTransferEncoding

Handles the *Content-Transfer-Encoding* header.

cte

Valid values are "7bit", "8bit", "base64", and "quoted-
printable". See **RFC 2045** for more information.

class email.headerregistry.HeaderRegistry(base_class=BaseHeader, default_class=UnstructuredHeader, use_default_map=True)

This is the factory used by "EmailPolicy" by default.
"HeaderRegistry" builds the class used to create a header instance
dynamically, using *base_class* and a specialized class retrieved
from a registry that it holds. When a given header name does not
appear in the registry, the class specified by *default_class* is
used as the specialized class. When *use_default_map* is "True"
(the default), the standard mapping of header names to classes is
copied in to the registry during initialization. *base_class* is
always the last class in the generated class’s "__bases__" list.

The default mappings are:

subject:
UniqueUnstructuredHeader

date:
UniqueDateHeader

resent-date:
DateHeader

orig-date:
UniqueDateHeader

sender:
UniqueSingleAddressHeader

resent-sender:
SingleAddressHeader

to:
UniqueAddressHeader

resent-to:
AddressHeader

cc:
UniqueAddressHeader

resent-cc:
AddressHeader

from:
UniqueAddressHeader

resent-from:
AddressHeader

reply-to:
UniqueAddressHeader

"HeaderRegistry" has the following methods:

map_to_type(self, name, cls)

*name* is the name of the header to be mapped. It will be
converted to lower case in the registry. *cls* is the
specialized class to be used, along with *base_class*, to create
the class used to instantiate headers that match *name*.

__getitem__(name)

Construct and return a class to handle creating a *name* header.

__call__(name, value)

Retrieves the specialized header associated with *name* from the
registry (using *default_class* if *name* does not appear in the
registry) and composes it with *base_class* to produce a class,
calls the constructed class’s constructor, passing it the same
argument list, and finally returns the class instance created
thereby.

The following classes are the classes used to represent data parsed
from structured headers and can, in general, be used by an application
program to construct structured values to assign to specific headers.

class email.headerregistry.Address(display_name='', username='', domain='', addr_spec=None)

The class used to represent an email address. The general form of
an address is:

[display_name] <username@domain>

or:

username@domain

where each part must conform to specific syntax rules spelled out
in **RFC 5322**.

As a convenience *addr_spec* can be specified instead of *username*
and *domain*, in which case *username* and *domain* will be parsed
from the *addr_spec*. An *addr_spec* must be a properly RFC quoted
string; if it is not "Address" will raise an error. Unicode
characters are allowed and will be property encoded when
serialized. However, per the RFCs, unicode is *not* allowed in the
username portion of the address.

display_name

The display name portion of the address, if any, with all
quoting removed. If the address does not have a display name,
this attribute will be an empty string.

username

The "username" portion of the address, with all quoting removed.

domain

The "domain" portion of the address.

addr_spec

The "username@domain" portion of the address, correctly quoted
for use as a bare address (the second form shown above). This
attribute is not mutable.

__str__()

The "str" value of the object is the address quoted according to
**RFC 5322** rules, but with no Content Transfer Encoding of any
non-ASCII characters.

To support SMTP (**RFC 5321**), "Address" handles one special case:
if "username" and "domain" are both the empty string (or "None"),
then the string value of the "Address" is "<>".

class email.headerregistry.Group(display_name=None, addresses=None)

The class used to represent an address group. The general form of
an address group is:

display_name: [address-list];

As a convenience for processing lists of addresses that consist of
a mixture of groups and single addresses, a "Group" may also be
used to represent single addresses that are not part of a group by
setting *display_name* to "None" and providing a list of the single
address as *addresses*.

display_name

The "display_name" of the group. If it is "None" and there is
exactly one "Address" in "addresses", then the "Group"
represents a single address that is not in a group.

addresses

A possibly empty tuple of "Address" objects representing the
addresses in the group.

__str__()

The "str" value of a "Group" is formatted according to **RFC
5322**, but with no Content Transfer Encoding of any non-ASCII
characters. If "display_name" is none and there is a single
"Address" in the "addresses" list, the "str" value will be the
same as the "str" of that single "Address".

-[ Footnotes ]-

[1] Originally added in 3.3 as a *provisional module*