Python 3.6.5 Documentation > Defining New Types
Defining New TypesDefining New Types****************** As mentioned in the last chapter, Python allows the writer of anextension module to define new types that can be manipulated fromPython code, much like strings and lists in core Python. This is not hard; the code for all extension types follows a pattern,but there are some details that you need to understand before you canget started.
The Basics========== The Python runtime sees all Python objects as variables of type"PyObject*", which serves as a “base type” for all Python objects."PyObject" itself only contains the refcount and a pointer to theobject’s “type object”. This is where the action is; the type objectdetermines which (C) functions get called when, for instance, anattribute gets looked up on an object or it is multiplied by anotherobject. These C functions are called “type methods”. So, if you want to define a new object type, you need to create a newtype object. This sort of thing can only be explained by example, so here’s aminimal, but complete, module that defines a new type: #include <Python.h> typedef struct { PyObject_HEAD /* Type-specific fields go here. */ } noddy_NoddyObject; static PyTypeObject noddy_NoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "noddy.Noddy", /* tp_name */ sizeof(noddy_NoddyObject), /* tp_basicsize */ 0, /* tp_itemsize */ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT, /* tp_flags */ "Noddy objects", /* tp_doc */ }; static PyModuleDef noddymodule = { PyModuleDef_HEAD_INIT, "noddy", "Example module that creates an extension type.", -1, NULL, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_noddy(void) { PyObject* m; noddy_NoddyType.tp_new = PyType_GenericNew; if (PyType_Ready(&noddy_NoddyType) < 0) return NULL; m = PyModule_Create(&noddymodule); if (m == NULL) return NULL; Py_INCREF(&noddy_NoddyType); PyModule_AddObject(m, "Noddy", (PyObject *)&noddy_NoddyType); return m; } Now that’s quite a bit to take in at once, but hopefully bits willseem familiar from the last chapter. The first bit that will be new is: typedef struct { PyObject_HEAD } noddy_NoddyObject; This is what a Noddy object will contain—in this case, nothing morethan what every Python object contains—a field called "ob_base" oftype "PyObject". "PyObject" in turn, contains an "ob_refcnt" fieldand a pointer to a type object. These can be accessed using themacros "Py_REFCNT" and "Py_TYPE" respectively. These are the fieldsthe "PyObject_HEAD" macro brings in. The reason for the macro is tostandardize the layout and to enable special debugging fields in debugbuilds. Note that there is no semicolon after the "PyObject_HEAD" macro; oneis included in the macro definition. Be wary of adding one byaccident; it’s easy to do from habit, and your compiler might notcomplain, but someone else’s probably will! (On Windows, MSVC isknown to call this an error and refuse to compile the code.) For contrast, let’s take a look at the corresponding definition forstandard Python floats: typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; Moving on, we come to the crunch — the type object. static PyTypeObject noddy_NoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "noddy.Noddy", /* tp_name */ sizeof(noddy_NoddyObject), /* tp_basicsize */ 0, /* tp_itemsize */ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_as_async */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT, /* tp_flags */ "Noddy objects", /* tp_doc */ }; Now if you go and look up the definition of "PyTypeObject" in"object.h" you’ll see that it has many more fields that the definitionabove. The remaining fields will be filled with zeros by the Ccompiler, and it’s common practice to not specify them explicitlyunless you need them. This is so important that we’re going to pick the top of it apartstill further: PyVarObject_HEAD_INIT(NULL, 0) This line is a bit of a wart; what we’d like to write is: PyVarObject_HEAD_INIT(&PyType_Type, 0) as the type of a type object is “type”, but this isn’t strictlyconforming C and some compilers complain. Fortunately, this memberwill be filled in for us by "PyType_Ready()". "noddy.Noddy", /* tp_name */ The name of our type. This will appear in the default textualrepresentation of our objects and in some error messages, for example: >>> "" + noddy.new_noddy() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot add type "noddy.Noddy" to string Note that the name is a dotted name that includes both the module nameand the name of the type within the module. The module in this case is"noddy" and the type is "Noddy", so we set the type name to"noddy.Noddy". One side effect of using an undotted name is that thepydoc documentation tool will not list the new type in the moduledocumentation. sizeof(noddy_NoddyObject), /* tp_basicsize */ This is so that Python knows how much memory to allocate when you call"PyObject_New()". Note: If you want your type to be subclassable from Python, and your type has the same "tp_basicsize" as its base type, you may have problems with multiple inheritance. A Python subclass of your type will have to list your type first in its "__bases__", or else it will not be able to call your type’s "__new__()" method without getting an error. You can avoid this problem by ensuring that your type has a larger value for "tp_basicsize" than its base type does. Most of the time, this will be true anyway, because either your base type will be "object", or else you will be adding data members to your base type, and therefore increasing its size. 0, /* tp_itemsize */ This has to do with variable length objects like lists and strings.Ignore this for now. Skipping a number of type methods that we don’t provide, we set theclass flags to "Py_TPFLAGS_DEFAULT". Py_TPFLAGS_DEFAULT, /* tp_flags */ All types should include this constant in their flags. It enables allof the members defined until at least Python 3.3. If you need furthermembers, you will need to OR the corresponding flags. We provide a doc string for the type in "tp_doc". "Noddy objects", /* tp_doc */ Now we get into the type methods, the things that make your objectsdifferent from the others. We aren’t going to implement any of thesein this version of the module. We’ll expand this example later tohave more interesting behavior. For now, all we want to be able to do is to create new "Noddy"objects. To enable object creation, we have to provide a "tp_new"implementation. In this case, we can just use the defaultimplementation provided by the API function "PyType_GenericNew()".We’d like to just assign this to the "tp_new" slot, but we can’t, forportability sake, On some platforms or compilers, we can’t staticallyinitialize a structure member with a function defined in another Cmodule, so, instead, we’ll assign the "tp_new" slot in the moduleinitialization function just before calling "PyType_Ready()": noddy_NoddyType.tp_new = PyType_GenericNew; if (PyType_Ready(&noddy_NoddyType) < 0) return; All the other type methods are *NULL*, so we’ll go over them later —that’s for a later section! Everything else in the file should be familiar, except for some codein "PyInit_noddy()": if (PyType_Ready(&noddy_NoddyType) < 0) return; This initializes the "Noddy" type, filing in a number of members,including "ob_type" that we initially set to *NULL*. PyModule_AddObject(m, "Noddy", (PyObject *)&noddy_NoddyType); This adds the type to the module dictionary. This allows us to create"Noddy" instances by calling the "Noddy" class: >>> import noddy >>> mynoddy = noddy.Noddy() That’s it! All that remains is to build it; put the above code in afile called "noddy.c" and from distutils.core import setup, Extension setup(name="noddy", version="1.0", ext_modules=[Extension("noddy", ["noddy.c"])]) in a file called "setup.py"; then typing $ python setup.py build at a shell should produce a file "noddy.so" in a subdirectory; move tothat directory and fire up Python — you should be able to "importnoddy" and play around with Noddy objects. That wasn’t so hard, was it? Of course, the current Noddy type is pretty uninteresting. It has nodata and doesn’t do anything. It can’t even be subclassed.
Adding data and methods to the Basic example-------------------------------------------- Let’s extend the basic example to add some data and methods. Let’salso make the type usable as a base class. We’ll create a new module,"noddy2" that adds these capabilities: #include <Python.h> #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } Noddy; static void Noddy_dealloc(Noddy* self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject*)self); } static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { Noddy *self; self = (Noddy *)type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *)self; } static int Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) { PyObject *first=NULL, *last=NULL, *tmp; static char *kwlist[] = {"first", "last", "number", NULL}; if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; }
static PyMemberDef Noddy_members[] = { {"first", T_OBJECT_EX, offsetof(Noddy, first), 0, "first name"}, {"last", T_OBJECT_EX, offsetof(Noddy, last), 0, "last name"}, {"number", T_INT, offsetof(Noddy, number), 0, "noddy number"}, {NULL} /* Sentinel */ }; static PyObject * Noddy_name(Noddy* self) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Noddy_methods[] = { {"name", (PyCFunction)Noddy_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject NoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "noddy.Noddy", /* tp_name */ sizeof(Noddy), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)Noddy_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ "Noddy objects", /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ Noddy_methods, /* tp_methods */ Noddy_members, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ (initproc)Noddy_init, /* tp_init */ 0, /* tp_alloc */ Noddy_new, /* tp_new */ }; static PyModuleDef noddy2module = { PyModuleDef_HEAD_INIT, "noddy2", "Example module that creates an extension type.", -1, NULL, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_noddy2(void) { PyObject* m; if (PyType_Ready(&NoddyType) < 0) return NULL; m = PyModule_Create(&noddy2module); if (m == NULL) return NULL; Py_INCREF(&NoddyType); PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType); return m; } This version of the module has a number of changes. We’ve added an extra include: #include <structmember.h> This include provides declarations that we use to handle attributes,as described a bit later. The name of the "Noddy" object structure has been shortened to"Noddy". The type object name has been shortened to "NoddyType". The "Noddy" type now has three data attributes, *first*, *last*, and*number*. The *first* and *last* variables are Python stringscontaining first and last names. The *number* attribute is an integer. The object structure is updated accordingly: typedef struct { PyObject_HEAD PyObject *first; PyObject *last; int number; } Noddy; Because we now have data to manage, we have to be more careful aboutobject allocation and deallocation. At a minimum, we need adeallocation method: static void Noddy_dealloc(Noddy* self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject*)self); } which is assigned to the "tp_dealloc" member: (destructor)Noddy_dealloc, /*tp_dealloc*/ This method decrements the reference counts of the two Pythonattributes. We use "Py_XDECREF()" here because the "first" and "last"members could be *NULL*. It then calls the "tp_free" member of theobject’s type to free the object’s memory. Note that the object’stype might not be "NoddyType", because the object may be an instanceof a subclass. We want to make sure that the first and last names are initialized toempty strings, so we provide a new method: static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { Noddy *self; self = (Noddy *)type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *)self; } and install it in the "tp_new" member: Noddy_new, /* tp_new */ The new member is responsible for creating (as opposed toinitializing) objects of the type. It is exposed in Python as the"__new__()" method. See the paper titled “Unifying types and classesin Python” for a detailed discussion of the "__new__()" method. Onereason to implement a new method is to assure the initial values ofinstance variables. In this case, we use the new method to make surethat the initial values of the members "first" and "last" are not*NULL*. If we didn’t care whether the initial values were *NULL*, wecould have used "PyType_GenericNew()" as our new method, as we didbefore. "PyType_GenericNew()" initializes all of the instancevariable members to *NULL*. The new method is a static method that is passed the type beinginstantiated and any arguments passed when the type was called, andthat returns the new object created. New methods always acceptpositional and keyword arguments, but they often ignore the arguments,leaving the argument handling to initializer methods. Note that if thetype supports subclassing, the type passed may not be the type beingdefined. The new method calls the "tp_alloc" slot to allocate memory.We don’t fill the "tp_alloc" slot ourselves. Rather "PyType_Ready()"fills it for us by inheriting it from our base class, which is"object" by default. Most types use the default allocation. Note: If you are creating a co-operative "tp_new" (one that calls a base type’s "tp_new" or "__new__()"), you must *not* try to determine what method to call using method resolution order at runtime. Always statically determine what type you are going to call, and call its "tp_new" directly, or via "type->tp_base->tp_new". If you do not do this, Python subclasses of your type that also inherit from other Python-defined classes may not work correctly. (Specifically, you may not be able to create instances of such subclasses without getting a "TypeError".) We provide an initialization function: static int Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) { PyObject *first=NULL, *last=NULL, *tmp; static char *kwlist[] = {"first", "last", "number", NULL}; if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; } by filling the "tp_init" slot. (initproc)Noddy_init, /* tp_init */ The "tp_init" slot is exposed in Python as the "__init__()" method. Itis used to initialize an object after it’s created. Unlike the newmethod, we can’t guarantee that the initializer is called. Theinitializer isn’t called when unpickling objects and it can beoverridden. Our initializer accepts arguments to provide initialvalues for our instance. Initializers always accept positional andkeyword arguments. Initializers should return either "0" on success or"-1" on error. Initializers can be called multiple times. Anyone can call the"__init__()" method on our objects. For this reason, we have to beextra careful when assigning the new values. We might be tempted, forexample to assign the "first" member like this: if (first) { Py_XDECREF(self->first); Py_INCREF(first); self->first = first; } But this would be risky. Our type doesn’t restrict the type of the"first" member, so it could be any kind of object. It could have adestructor that causes code to be executed that tries to access the"first" member. To be paranoid and protect ourselves against thispossibility, we almost always reassign members before decrementingtheir reference counts. When don’t we have to do this? * when we absolutely know that the reference count is greater than 1 * when we know that deallocation of the object [1] will not cause any calls back into our type’s code * when decrementing a reference count in a "tp_dealloc" handler when garbage-collections is not supported [2] We want to expose our instance variables as attributes. There are anumber of ways to do that. The simplest way is to define memberdefinitions: static PyMemberDef Noddy_members[] = { {"first", T_OBJECT_EX, offsetof(Noddy, first), 0, "first name"}, {"last", T_OBJECT_EX, offsetof(Noddy, last), 0, "last name"}, {"number", T_INT, offsetof(Noddy, number), 0, "noddy number"}, {NULL} /* Sentinel */ }; and put the definitions in the "tp_members" slot: Noddy_members, /* tp_members */ Each member definition has a member name, type, offset, access flagsand documentation string. See the Generic Attribute Management sectionbelow for details. A disadvantage of this approach is that it doesn’t provide a way torestrict the types of objects that can be assigned to the Pythonattributes. We expect the first and last names to be strings, but anyPython objects can be assigned. Further, the attributes can bedeleted, setting the C pointers to *NULL*. Even though we can makesure the members are initialized to non-*NULL* values, the members canbe set to *NULL* if the attributes are deleted. We define a single method, "name()", that outputs the objects name asthe concatenation of the first and last names. static PyObject * Noddy_name(Noddy* self) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } The method is implemented as a C function that takes a "Noddy" (or"Noddy" subclass) instance as the first argument. Methods always takean instance as the first argument. Methods often take positional andkeyword arguments as well, but in this case we don’t take any anddon’t need to accept a positional argument tuple or keyword argumentdictionary. This method is equivalent to the Python method: def name(self): return "%s %s" % (self.first, self.last) Note that we have to check for the possibility that our "first" and"last" members are *NULL*. This is because they can be deleted, inwhich case they are set to *NULL*. It would be better to preventdeletion of these attributes and to restrict the attribute values tobe strings. We’ll see how to do that in the next section. Now that we’ve defined the method, we need to create an array ofmethod definitions: static PyMethodDef Noddy_methods[] = { {"name", (PyCFunction)Noddy_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; and assign them to the "tp_methods" slot: Noddy_methods, /* tp_methods */ Note that we used the "METH_NOARGS" flag to indicate that the methodis passed no arguments. Finally, we’ll make our type usable as a base class. We’ve writtenour methods carefully so far so that they don’t make any assumptionsabout the type of the object being created or used, so all we need todo is to add the "Py_TPFLAGS_BASETYPE" to our class flag definition: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/ We rename "PyInit_noddy()" to "PyInit_noddy2()" and update the modulename in the "PyModuleDef" struct. Finally, we update our "setup.py" file to build the new module: from distutils.core import setup, Extension setup(name="noddy", version="1.0", ext_modules=[ Extension("noddy", ["noddy.c"]), Extension("noddy2", ["noddy2.c"]), ])
Providing finer control over data attributes-------------------------------------------- In this section, we’ll provide finer control over how the "first" and"last" attributes are set in the "Noddy" example. In the previousversion of our module, the instance variables "first" and "last" couldbe set to non-string values or even deleted. We want to make sure thatthese attributes always contain strings. #include <Python.h> #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; PyObject *last; int number; } Noddy; static void Noddy_dealloc(Noddy* self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject*)self); } static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { Noddy *self; self = (Noddy *)type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *)self; } static int Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) { PyObject *first=NULL, *last=NULL, *tmp; static char *kwlist[] = {"first", "last", "number", NULL}; if (! PyArg_ParseTupleAndKeywords(args, kwds, "|SSi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } static PyMemberDef Noddy_members[] = { {"number", T_INT, offsetof(Noddy, number), 0, "noddy number"}, {NULL} /* Sentinel */ }; static PyObject * Noddy_getfirst(Noddy *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Noddy_setfirst(Noddy *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (! PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } Py_DECREF(self->first); Py_INCREF(value); self->first = value; return 0; } static PyObject * Noddy_getlast(Noddy *self, void *closure) { Py_INCREF(self->last); return self->last; } static int Noddy_setlast(Noddy *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute"); return -1; } if (! PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The last attribute value must be a string"); return -1; } Py_DECREF(self->last); Py_INCREF(value); self->last = value; return 0; } static PyGetSetDef Noddy_getseters[] = { {"first", (getter)Noddy_getfirst, (setter)Noddy_setfirst, "first name", NULL}, {"last", (getter)Noddy_getlast, (setter)Noddy_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; static PyObject * Noddy_name(Noddy* self) { return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Noddy_methods[] = { {"name", (PyCFunction)Noddy_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject NoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "noddy.Noddy", /* tp_name */ sizeof(Noddy), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)Noddy_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ "Noddy objects", /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ Noddy_methods, /* tp_methods */ Noddy_members, /* tp_members */ Noddy_getseters, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ (initproc)Noddy_init, /* tp_init */ 0, /* tp_alloc */ Noddy_new, /* tp_new */ }; static PyModuleDef noddy3module = { PyModuleDef_HEAD_INIT, "noddy3", "Example module that creates an extension type.", -1, NULL, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_noddy3(void) { PyObject* m; if (PyType_Ready(&NoddyType) < 0) return NULL; m = PyModule_Create(&noddy3module); if (m == NULL) return NULL; Py_INCREF(&NoddyType); PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType); return m; } To provide greater control, over the "first" and "last" attributes,we’ll use custom getter and setter functions. Here are the functionsfor getting and setting the "first" attribute: Noddy_getfirst(Noddy *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Noddy_setfirst(Noddy *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (! PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a str"); return -1; } Py_DECREF(self->first); Py_INCREF(value); self->first = value; return 0; } The getter function is passed a "Noddy" object and a “closure”, whichis void pointer. In this case, the closure is ignored. (The closuresupports an advanced usage in which definition data is passed to thegetter and setter. This could, for example, be used to allow a singleset of getter and setter functions that decide the attribute to get orset based on data in the closure.) The setter function is passed the "Noddy" object, the new value, andthe closure. The new value may be *NULL*, in which case the attributeis being deleted. In our setter, we raise an error if the attributeis deleted or if the attribute value is not a string. We create an array of "PyGetSetDef" structures: static PyGetSetDef Noddy_getseters[] = { {"first", (getter)Noddy_getfirst, (setter)Noddy_setfirst, "first name", NULL}, {"last", (getter)Noddy_getlast, (setter)Noddy_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; and register it in the "tp_getset" slot: Noddy_getseters, /* tp_getset */ to register our attribute getters and setters. The last item in a "PyGetSetDef" structure is the closure mentionedabove. In this case, we aren’t using the closure, so we just pass*NULL*. We also remove the member definitions for these attributes: static PyMemberDef Noddy_members[] = { {"number", T_INT, offsetof(Noddy, number), 0, "noddy number"}, {NULL} /* Sentinel */ }; We also need to update the "tp_init" handler to only allow strings [3]to be passed: static int Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) { PyObject *first=NULL, *last=NULL, *tmp; static char *kwlist[] = {"first", "last", "number", NULL}; if (! PyArg_ParseTupleAndKeywords(args, kwds, "|SSi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } With these changes, we can assure that the "first" and "last" membersare never *NULL* so we can remove checks for *NULL* values in almostall cases. This means that most of the "Py_XDECREF()" calls can beconverted to "Py_DECREF()" calls. The only place we can’t change thesecalls is in the deallocator, where there is the possibility that theinitialization of these members failed in the constructor. We also rename the module initialization function and module name inthe initialization function, as we did before, and we add an extradefinition to the "setup.py" file.
Supporting cyclic garbage collection------------------------------------ Python has a cyclic-garbage collector that can identify unneededobjects even when their reference counts are not zero. This can happenwhen objects are involved in cycles. For example, consider: >>> l = [] >>> l.append(l) >>> del l In this example, we create a list that contains itself. When we deleteit, it still has a reference from itself. Its reference count doesn’tdrop to zero. Fortunately, Python’s cyclic-garbage collector willeventually figure out that the list is garbage and free it. In the second version of the "Noddy" example, we allowed any kind ofobject to be stored in the "first" or "last" attributes [4]. Thismeans that "Noddy" objects can participate in cycles: >>> import noddy2 >>> n = noddy2.Noddy() >>> l = [n] >>> n.first = l This is pretty silly, but it gives us an excuse to add support for thecyclic-garbage collector to the "Noddy" example. To support cyclicgarbage collection, types need to fill two slots and set a class flagthat enables these slots: #include <Python.h> #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; PyObject *last; int number; } Noddy; static int Noddy_traverse(Noddy *self, visitproc visit, void *arg) { int vret; if (self->first) { vret = visit(self->first, arg); if (vret != 0) return vret; } if (self->last) { vret = visit(self->last, arg); if (vret != 0) return vret; } return 0; } static int Noddy_clear(Noddy *self) { PyObject *tmp; tmp = self->first; self->first = NULL; Py_XDECREF(tmp); tmp = self->last; self->last = NULL; Py_XDECREF(tmp); return 0; } static void Noddy_dealloc(Noddy* self) { PyObject_GC_UnTrack(self); Noddy_clear(self); Py_TYPE(self)->tp_free((PyObject*)self); } static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { Noddy *self; self = (Noddy *)type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *)self; } static int Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) { PyObject *first=NULL, *last=NULL, *tmp; static char *kwlist[] = {"first", "last", "number", NULL}; if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; }
static PyMemberDef Noddy_members[] = { {"first", T_OBJECT_EX, offsetof(Noddy, first), 0, "first name"}, {"last", T_OBJECT_EX, offsetof(Noddy, last), 0, "last name"}, {"number", T_INT, offsetof(Noddy, number), 0, "noddy number"}, {NULL} /* Sentinel */ }; static PyObject * Noddy_name(Noddy* self) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Noddy_methods[] = { {"name", (PyCFunction)Noddy_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject NoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "noddy.Noddy", /* tp_name */ sizeof(Noddy), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)Noddy_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, /* tp_flags */ "Noddy objects", /* tp_doc */ (traverseproc)Noddy_traverse, /* tp_traverse */ (inquiry)Noddy_clear, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ Noddy_methods, /* tp_methods */ Noddy_members, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ (initproc)Noddy_init, /* tp_init */ 0, /* tp_alloc */ Noddy_new, /* tp_new */ }; static PyModuleDef noddy4module = { PyModuleDef_HEAD_INIT, "noddy4", "Example module that creates an extension type.", -1, NULL, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_noddy4(void) { PyObject* m; if (PyType_Ready(&NoddyType) < 0) return NULL; m = PyModule_Create(&noddy4module); if (m == NULL) return NULL; Py_INCREF(&NoddyType); PyModule_AddObject(m, "Noddy", (PyObject *)&NoddyType); return m; } The traversal method provides access to subobjects that couldparticipate in cycles: static int Noddy_traverse(Noddy *self, visitproc visit, void *arg) { int vret; if (self->first) { vret = visit(self->first, arg); if (vret != 0) return vret; } if (self->last) { vret = visit(self->last, arg); if (vret != 0) return vret; } return 0; } For each subobject that can participate in cycles, we need to call the"visit()" function, which is passed to the traversal method. The"visit()" function takes as arguments the subobject and the extraargument *arg* passed to the traversal method. It returns an integervalue that must be returned if it is non-zero. Python provides a "Py_VISIT()" macro that automates calling visitfunctions. With "Py_VISIT()", "Noddy_traverse()" can be simplified: static int Noddy_traverse(Noddy *self, visitproc visit, void *arg) { Py_VISIT(self->first); Py_VISIT(self->last); return 0; } Note: Note that the "tp_traverse" implementation must name its arguments exactly *visit* and *arg* in order to use "Py_VISIT()". This is to encourage uniformity across these boring implementations. We also need to provide a method for clearing any subobjects that canparticipate in cycles. static int Noddy_clear(Noddy *self) { PyObject *tmp; tmp = self->first; self->first = NULL; Py_XDECREF(tmp); tmp = self->last; self->last = NULL; Py_XDECREF(tmp); return 0; } Notice the use of a temporary variable in "Noddy_clear()". We use thetemporary variable so that we can set each member to *NULL* beforedecrementing its reference count. We do this because, as wasdiscussed earlier, if the reference count drops to zero, we mightcause code to run that calls back into the object. In addition,because we now support garbage collection, we also have to worry aboutcode being run that triggers garbage collection. If garbagecollection is run, our "tp_traverse" handler could get called. Wecan’t take a chance of having "Noddy_traverse()" called when amember’s reference count has dropped to zero and its value hasn’t beenset to *NULL*. Python provides a "Py_CLEAR()" that automates the careful decrementingof reference counts. With "Py_CLEAR()", the "Noddy_clear()" functioncan be simplified: static int Noddy_clear(Noddy *self) { Py_CLEAR(self->first); Py_CLEAR(self->last); return 0; } Note that "Noddy_dealloc()" may call arbitrary functions through"__del__" method or weakref callback. It means circular GC can betriggered inside the function. Since GC assumes reference count isnot zero, we need to untrack the object from GC by calling"PyObject_GC_UnTrack()" before clearing members. Here is reimplementeddeallocator which uses "PyObject_GC_UnTrack()" and "Noddy_clear()". static void Noddy_dealloc(Noddy* self) { PyObject_GC_UnTrack(self); Noddy_clear(self); Py_TYPE(self)->tp_free((PyObject*)self); } Finally, we add the "Py_TPFLAGS_HAVE_GC" flag to the class flags: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, /* tp_flags */ That’s pretty much it. If we had written custom "tp_alloc" or"tp_free" slots, we’d need to modify them for cyclic-garbagecollection. Most extensions will use the versions automaticallyprovided.
Subclassing other types----------------------- It is possible to create new extension types that are derived fromexisting types. It is easiest to inherit from the built in types,since an extension can easily use the "PyTypeObject" it needs. It canbe difficult to share these "PyTypeObject" structures betweenextension modules. In this example we will create a "Shoddy" type that inherits from thebuilt-in "list" type. The new type will be completely compatible withregular lists, but will have an additional "increment()" method thatincreases an internal counter. >>> import shoddy >>> s = shoddy.Shoddy(range(3)) >>> s.extend(s) >>> print(len(s)) 6 >>> print(s.increment()) 1 >>> print(s.increment()) 2 #include <Python.h> typedef struct { PyListObject list; int state; } Shoddy;
static PyObject * Shoddy_increment(Shoddy *self, PyObject *unused) { self->state++; return PyLong_FromLong(self->state); }
static PyMethodDef Shoddy_methods[] = { {"increment", (PyCFunction)Shoddy_increment, METH_NOARGS, PyDoc_STR("increment state counter")}, {NULL, NULL}, }; static int Shoddy_init(Shoddy *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *)self, args, kwds) < 0) return -1; self->state = 0; return 0; }
static PyTypeObject ShoddyType = { PyVarObject_HEAD_INIT(NULL, 0) "shoddy.Shoddy", /* tp_name */ sizeof(Shoddy), /* tp_basicsize */ 0, /* tp_itemsize */ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ 0, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ Shoddy_methods, /* tp_methods */ 0, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ (initproc)Shoddy_init, /* tp_init */ 0, /* tp_alloc */ 0, /* tp_new */ }; static PyModuleDef shoddymodule = { PyModuleDef_HEAD_INIT, "shoddy", "Shoddy module", -1, NULL, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_shoddy(void) { PyObject *m; ShoddyType.tp_base = &PyList_Type; if (PyType_Ready(&ShoddyType) < 0) return NULL; m = PyModule_Create(&shoddymodule); if (m == NULL) return NULL; Py_INCREF(&ShoddyType); PyModule_AddObject(m, "Shoddy", (PyObject *) &ShoddyType); return m; } As you can see, the source code closely resembles the "Noddy" examplesin previous sections. We will break down the main differences betweenthem. typedef struct { PyListObject list; int state; } Shoddy; The primary difference for derived type objects is that the basetype’s object structure must be the first value. The base type willalready include the "PyObject_HEAD()" at the beginning of itsstructure. When a Python object is a "Shoddy" instance, its *PyObject** pointercan be safely cast to both *PyListObject** and *Shoddy**. static int Shoddy_init(Shoddy *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *)self, args, kwds) < 0) return -1; self->state = 0; return 0; } In the "__init__" method for our type, we can see how to call throughto the "__init__" method of the base type. This pattern is important when writing a type with custom "new" and"dealloc" methods. The "new" method should not actually create thememory for the object with "tp_alloc", that will be handled by thebase class when calling its "tp_new". When filling out the "PyTypeObject()" for the "Shoddy" type, you see aslot for "tp_base()". Due to cross platform compiler issues, you can’tfill that field directly with the "PyList_Type()"; it can be donelater in the module’s "init()" function. PyMODINIT_FUNC PyInit_shoddy(void) { PyObject *m; ShoddyType.tp_base = &PyList_Type; if (PyType_Ready(&ShoddyType) < 0) return NULL; m = PyModule_Create(&shoddymodule); if (m == NULL) return NULL; Py_INCREF(&ShoddyType); PyModule_AddObject(m, "Shoddy", (PyObject *) &ShoddyType); return m; } Before calling "PyType_Ready()", the type structure must have the"tp_base" slot filled in. When we are deriving a new type, it is notnecessary to fill out the "tp_alloc" slot with "PyType_GenericNew()" –the allocate function from the base type will be inherited. After that, calling "PyType_Ready()" and adding the type object to themodule is the same as with the basic "Noddy" examples.
Type Methods============ This section aims to give a quick fly-by on the various type methodsyou can implement and what they do. Here is the definition of "PyTypeObject", with some fields only usedin debug builds omitted: typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "<module>.<name>" */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; } PyTypeObject; Now that’s a *lot* of methods. Don’t worry too much though - if youhave a type you want to define, the chances are very good that youwill only implement a handful of these. As you probably expect by now, we’re going to go over this and givemore information about the various handlers. We won’t go in the orderthey are defined in the structure, because there is a lot ofhistorical baggage that impacts the ordering of the fields; be sureyour type initialization keeps the fields in the right order! It’soften easiest to find an example that includes all the fields you need(even if they’re initialized to "0") and then change the values tosuit your new type. const char *tp_name; /* For printing */ The name of the type - as mentioned in the last section, this willappear in various places, almost entirely for diagnostic purposes. Tryto choose something that will be helpful in such a situation! Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ These fields tell the runtime how much memory to allocate when newobjects of this type are created. Python has some built-in supportfor variable length structures (think: strings, lists) which is wherethe "tp_itemsize" field comes in. This will be dealt with later. const char *tp_doc; Here you can put a string (or its address) that you want returned whenthe Python script references "obj.__doc__" to retrieve the doc string. Now we come to the basic type methods—the ones most extension typeswill implement.
Finalization and De-allocation------------------------------ destructor tp_dealloc; This function is called when the reference count of the instance ofyour type is reduced to zero and the Python interpreter wants toreclaim it. If your type has memory to free or other clean-up toperform, you can put it here. The object itself needs to be freedhere as well. Here is an example of this function: static void newdatatype_dealloc(newdatatypeobject * obj) { free(obj->obj_UnderlyingDatatypePtr); Py_TYPE(obj)->tp_free(obj); } One important requirement of the deallocator function is that itleaves any pending exceptions alone. This is important sincedeallocators are frequently called as the interpreter unwinds thePython stack; when the stack is unwound due to an exception (ratherthan normal returns), nothing is done to protect the deallocators fromseeing that an exception has already been set. Any actions which adeallocator performs which may cause additional Python code to beexecuted may detect that an exception has been set. This can lead tomisleading errors from the interpreter. The proper way to protectagainst this is to save a pending exception before performing theunsafe action, and restoring it when done. This can be done using the"PyErr_Fetch()" and "PyErr_Restore()" functions: static void my_dealloc(PyObject *obj) { MyObject *self = (MyObject *) obj; PyObject *cbresult; if (self->my_callback != NULL) { PyObject *err_type, *err_value, *err_traceback; /* This saves the current exception state */ PyErr_Fetch(&err_type, &err_value, &err_traceback); cbresult = PyObject_CallObject(self->my_callback, NULL); if (cbresult == NULL) PyErr_WriteUnraisable(self->my_callback); else Py_DECREF(cbresult); /* This restores the saved exception state */ PyErr_Restore(err_type, err_value, err_traceback); Py_DECREF(self->my_callback); } Py_TYPE(obj)->tp_free((PyObject*)self); } Note: There are limitations to what you can safely do in a deallocator function. First, if your type supports garbage collection (using "tp_traverse" and/or "tp_clear"), some of the object’s members can have been cleared or finalized by the time "tp_dealloc" is called. Second, in "tp_dealloc", your object is in an unstable state: its reference count is equal to zero. Any call to a non-trivial object or API (as in the example above) might end up calling "tp_dealloc" again, causing a double free and a crash.Starting with Python 3.4, it is recommended not to put any complex finalization code in "tp_dealloc", and instead use the new "tp_finalize" type method. See also: **PEP 442** explains the new finalization scheme.
Object Presentation------------------- In Python, there are two ways to generate a textual representation ofan object: the "repr()" function, and the "str()" function. (The"print()" function just calls "str()".) These handlers are bothoptional. reprfunc tp_repr; reprfunc tp_str; The "tp_repr" handler should return a string object containing arepresentation of the instance for which it is called. Here is asimple example: static PyObject * newdatatype_repr(newdatatypeobject * obj) { return PyUnicode_FromFormat("Repr-ified_newdatatype{{size:\%d}}", obj->obj_UnderlyingDatatypePtr->size); } If no "tp_repr" handler is specified, the interpreter will supply arepresentation that uses the type’s "tp_name" and a uniquely-identifying value for the object. The "tp_str" handler is to "str()" what the "tp_repr" handlerdescribed above is to "repr()"; that is, it is called when Python codecalls "str()" on an instance of your object. Its implementation isvery similar to the "tp_repr" function, but the resulting string isintended for human consumption. If "tp_str" is not specified, the"tp_repr" handler is used instead. Here is a simple example: static PyObject * newdatatype_str(newdatatypeobject * obj) { return PyUnicode_FromFormat("Stringified_newdatatype{{size:\%d}}", obj->obj_UnderlyingDatatypePtr->size); }
Attribute Management-------------------- For every object which can support attributes, the corresponding typemust provide the functions that control how the attributes areresolved. There needs to be a function which can retrieve attributes(if any are defined), and another to set attributes (if settingattributes is allowed). Removing an attribute is a special case, forwhich the new value passed to the handler is *NULL*. Python supports two pairs of attribute handlers; a type that supportsattributes only needs to implement the functions for one pair. Thedifference is that one pair takes the name of the attribute as a"char*", while the other accepts a "PyObject*". Each type can usewhichever pair makes more sense for the implementation’s convenience. getattrfunc tp_getattr; /* char * version */ setattrfunc tp_setattr; /* ... */ getattrofunc tp_getattro; /* PyObject * version */ setattrofunc tp_setattro; If accessing attributes of an object is always a simple operation(this will be explained shortly), there are generic implementationswhich can be used to provide the "PyObject*" version of the attributemanagement functions. The actual need for type-specific attributehandlers almost completely disappeared starting with Python 2.2,though there are many examples which have not been updated to use someof the new generic mechanism that is available.
Generic Attribute Management~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most extension types only use *simple* attributes. So, what makes theattributes simple? There are only a couple of conditions that must bemet: 1. The name of the attributes must be known when "PyType_Ready()" is called. 2. No special processing is needed to record that an attribute was looked up or set, nor do actions need to be taken based on the value. Note that this list does not place any restrictions on the values ofthe attributes, when the values are computed, or how relevant data isstored. When "PyType_Ready()" is called, it uses three tables referenced bythe type object to create *descriptor*s which are placed in thedictionary of the type object. Each descriptor controls access to oneattribute of the instance object. Each of the tables is optional; ifall three are *NULL*, instances of the type will only have attributesthat are inherited from their base type, and should leave the"tp_getattro" and "tp_setattro" fields *NULL* as well, allowing thebase type to handle attributes. The tables are declared as three fields of the type object: struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; If "tp_methods" is not *NULL*, it must refer to an array of"PyMethodDef" structures. Each entry in the table is an instance ofthis structure: typedef struct PyMethodDef { const char *ml_name; /* method name */ PyCFunction ml_meth; /* implementation function */ int ml_flags; /* flags */ const char *ml_doc; /* docstring */ } PyMethodDef; One entry should be defined for each method provided by the type; noentries are needed for methods inherited from a base type. Oneadditional entry is needed at the end; it is a sentinel that marks theend of the array. The "ml_name" field of the sentinel must be *NULL*. The second table is used to define attributes which map directly todata stored in the instance. A variety of primitive C types aresupported, and access may be read-only or read-write. The structuresin the table are defined as: typedef struct PyMemberDef { char *name; int type; int offset; int flags; char *doc; } PyMemberDef; For each entry in the table, a *descriptor* will be constructed andadded to the type which will be able to extract a value from theinstance structure. The "type" field should contain one of the typecodes defined in the "structmember.h" header; the value will be usedto determine how to convert Python values to and from C values. The"flags" field is used to store flags which control how the attributecan be accessed. The following flag constants are defined in "structmember.h"; they maybe combined using bitwise-OR. +-----------------------------+------------------------------------------------+| Constant | Meaning |+=============================+================================================+| "READONLY" | Never writable. |+-----------------------------+------------------------------------------------+| "READ_RESTRICTED" | Not readable in restricted mode. |+-----------------------------+------------------------------------------------+| "WRITE_RESTRICTED" | Not writable in restricted mode. |+-----------------------------+------------------------------------------------+| "RESTRICTED" | Not readable or writable in restricted mode. |+-----------------------------+------------------------------------------------+ An interesting advantage of using the "tp_members" table to builddescriptors that are used at runtime is that any attribute definedthis way can have an associated doc string simply by providing thetext in the table. An application can use the introspection API toretrieve the descriptor from the class object, and get the doc stringusing its "__doc__" attribute. As with the "tp_methods" table, a sentinel entry with a "name" valueof *NULL* is required.
Type-specific Attribute Management~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For simplicity, only the "char*" version will be demonstrated here;the type of the name parameter is the only difference between the"char*" and "PyObject*" flavors of the interface. This exampleeffectively does the same thing as the generic example above, but doesnot use the generic support added in Python 2.2. It explains how thehandler functions are called, so that if you do need to extend theirfunctionality, you’ll understand what needs to be done. The "tp_getattr" handler is called when the object requires anattribute look-up. It is called in the same situations where the"__getattr__()" method of a class would be called. Here is an example: static PyObject * newdatatype_getattr(newdatatypeobject *obj, char *name) { if (strcmp(name, "data") == 0) { return PyLong_FromLong(obj->data); } PyErr_Format(PyExc_AttributeError, "'%.50s' object has no attribute '%.400s'", tp->tp_name, name); return NULL; } The "tp_setattr" handler is called when the "__setattr__()" or"__delattr__()" method of a class instance would be called. When anattribute should be deleted, the third parameter will be *NULL*. Hereis an example that simply raises an exception; if this were really allyou wanted, the "tp_setattr" handler should be set to *NULL*. static int newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v) { (void)PyErr_Format(PyExc_RuntimeError, "Read-only attribute: \%s", name); return -1; }
Object Comparison----------------- richcmpfunc tp_richcompare; The "tp_richcompare" handler is called when comparisons are needed.It is analogous to the rich comparison methods, like "__lt__()", andalso called by "PyObject_RichCompare()" and"PyObject_RichCompareBool()". This function is called with two Python objects and the operator asarguments, where the operator is one of "Py_EQ", "Py_NE", "Py_LE","Py_GT", "Py_LT" or "Py_GT". It should compare the two objects withrespect to the specified operator and return "Py_True" or "Py_False"if the comparison is successful, "Py_NotImplemented" to indicate thatcomparison is not implemented and the other object’s comparison methodshould be tried, or *NULL* if an exception was set. Here is a sample implementation, for a datatype that is consideredequal if the size of an internal pointer is equal: static PyObject * newdatatype_richcmp(PyObject *obj1, PyObject *obj2, int op) { PyObject *result; int c, size1, size2; /* code to make sure that both arguments are of type newdatatype omitted */ size1 = obj1->obj_UnderlyingDatatypePtr->size; size2 = obj2->obj_UnderlyingDatatypePtr->size; switch (op) { case Py_LT: c = size1 < size2; break; case Py_LE: c = size1 <= size2; break; case Py_EQ: c = size1 == size2; break; case Py_NE: c = size1 != size2; break; case Py_GT: c = size1 > size2; break; case Py_GE: c = size1 >= size2; break; } result = c ? Py_True : Py_False; Py_INCREF(result); return result; }
Abstract Protocol Support------------------------- Python supports a variety of *abstract* ‘protocols;’ the specificinterfaces provided to use these interfaces are documented in AbstractObjects Layer. A number of these abstract interfaces were defined early in thedevelopment of the Python implementation. In particular, the number,mapping, and sequence protocols have been part of Python since thebeginning. Other protocols have been added over time. For protocolswhich depend on several handler routines from the type implementation,the older protocols have been defined as optional blocks of handlersreferenced by the type object. For newer protocols there areadditional slots in the main type object, with a flag bit being set toindicate that the slots are present and should be checked by theinterpreter. (The flag bit does not indicate that the slot values arenon-*NULL*. The flag may be set to indicate the presence of a slot,but a slot may still be unfilled.) PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; If you wish your object to be able to act like a number, a sequence,or a mapping object, then you place the address of a structure thatimplements the C type "PyNumberMethods", "PySequenceMethods", or"PyMappingMethods", respectively. It is up to you to fill in thisstructure with appropriate values. You can find examples of the use ofeach of these in the "Objects" directory of the Python sourcedistribution. hashfunc tp_hash; This function, if you choose to provide it, should return a hashnumber for an instance of your data type. Here is a moderatelypointless example: static long newdatatype_hash(newdatatypeobject *obj) { long result; result = obj->obj_UnderlyingDatatypePtr->size; result = result * 3; return result; } ternaryfunc tp_call; This function is called when an instance of your data type is“called”, for example, if "obj1" is an instance of your data type andthe Python script contains "obj1('hello')", the "tp_call" handler isinvoked. This function takes three arguments: 1. *arg1* is the instance of the data type which is the subject of the call. If the call is "obj1('hello')", then *arg1* is "obj1". 2. *arg2* is a tuple containing the arguments to the call. You can use "PyArg_ParseTuple()" to extract the arguments. 3. *arg3* is a dictionary of keyword arguments that were passed. If this is non-*NULL* and you support keyword arguments, use "PyArg_ParseTupleAndKeywords()" to extract the arguments. If you do not want to support keyword arguments and this is non-*NULL*, raise a "TypeError" with a message saying that keyword arguments are not supported. Here is a desultory example of the implementation of the callfunction. /* Implement the call function. * obj1 is the instance receiving the call. * obj2 is a tuple containing the arguments to the call, in this * case 3 strings. */ static PyObject * newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other) { PyObject *result; char *arg1; char *arg2; char *arg3; if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) { return NULL; } result = PyUnicode_FromFormat( "Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n", obj->obj_UnderlyingDatatypePtr->size, arg1, arg2, arg3); return result; } /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; These functions provide support for the iterator protocol. Any objectwhich wishes to support iteration over its contents (which may begenerated during iteration) must implement the "tp_iter" handler.Objects which are returned by a "tp_iter" handler must implement boththe "tp_iter" and "tp_iternext" handlers. Both handlers take exactlyone parameter, the instance for which they are being called, andreturn a new reference. In the case of an error, they should set anexception and return *NULL*. For an object which represents an iterable collection, the "tp_iter"handler must return an iterator object. The iterator object isresponsible for maintaining the state of the iteration. Forcollections which can support multiple iterators which do notinterfere with each other (as lists and tuples do), a new iteratorshould be created and returned. Objects which can only be iteratedover once (usually due to side effects of iteration) should implementthis handler by returning a new reference to themselves, and shouldalso implement the "tp_iternext" handler. File objects are an exampleof such an iterator. Iterator objects should implement both handlers. The "tp_iter"handler should return a new reference to the iterator (this is thesame as the "tp_iter" handler for objects which can only be iteratedover destructively). The "tp_iternext" handler should return a newreference to the next object in the iteration if there is one. If theiteration has reached the end, it may return *NULL* without setting anexception or it may set "StopIteration"; avoiding the exception canyield slightly better performance. If an actual error occurs, itshould set an exception and return *NULL*.
Weak Reference Support---------------------- One of the goals of Python’s weak-reference implementation is to allowany type to participate in the weak reference mechanism withoutincurring the overhead on those objects which do not benefit by weakreferencing (such as numbers). For an object to be weakly referencable, the extension must include a"PyObject*" field in the instance structure for the use of the weakreference mechanism; it must be initialized to *NULL* by the object’sconstructor. It must also set the "tp_weaklistoffset" field of thecorresponding type object to the offset of the field. For example, theinstance type is defined with the following structure: typedef struct { PyObject_HEAD PyClassObject *in_class; /* The class object */ PyObject *in_dict; /* A dictionary */ PyObject *in_weakreflist; /* List of weak references */ } PyInstanceObject; The statically-declared type object for instances is defined this way: PyTypeObject PyInstance_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) 0, "module.instance", /* Lots of stuff omitted for brevity... */ Py_TPFLAGS_DEFAULT, /* tp_flags */ 0, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ offsetof(PyInstanceObject, in_weakreflist), /* tp_weaklistoffset */ }; The type constructor is responsible for initializing the weakreference list to *NULL*: static PyObject * instance_new() { /* Other initialization stuff omitted for brevity */ self->in_weakreflist = NULL; return (PyObject *) self; } The only further addition is that the destructor needs to call theweak reference manager to clear any weak references. This is onlyrequired if the weak reference list is non-*NULL*: static void instance_dealloc(PyInstanceObject *inst) { /* Allocate temporaries if needed, but do not begin destruction just yet. */ if (inst->in_weakreflist != NULL) PyObject_ClearWeakRefs((PyObject *) inst); /* Proceed with object destruction normally. */ }
More Suggestions---------------- Remember that you can omit most of these functions, in which case youprovide "0" as a value. There are type definitions for each of thefunctions you must provide. They are in "object.h" in the Pythoninclude directory that comes with the source distribution of Python. In order to learn how to implement any specific method for your newdata type, do the following: Download and unpack the Python sourcedistribution. Go to the "Objects" directory, then search the C sourcefiles for "tp_" plus the function you want (for example,"tp_richcompare"). You will find examples of the function you want toimplement. When you need to verify that an object is an instance of the type youare implementing, use the "PyObject_TypeCheck()" function. A sample ofits use might be something like the following: if (! PyObject_TypeCheck(some_object, &MyType)) { PyErr_SetString(PyExc_TypeError, "arg #1 not a mything"); return NULL; } -[ Footnotes ]- [1] This is true when we know that the object is a basic type, like a string or a float. [2] We relied on this in the "tp_dealloc" handler in this example, because our type doesn’t support garbage collection. Even if a type supports garbage collection, there are calls that can be made to “untrack” the object from garbage collection, however, these calls are advanced and not covered here. [3] We now know that the first and last members are strings, so perhaps we could be less careful about decrementing their reference counts, however, we accept instances of string subclasses. Even though deallocating normal strings won’t call back into our objects, we can’t guarantee that deallocating an instance of a string subclass won’t call back into our objects. [4] Even in the third version, we aren’t guaranteed to avoid cycles. Instances of string subclasses are allowed and string subclasses could allow cycles even if normal strings don’t.
|