Types of Descriptors

Re­sum­ing from where we left of­f, on the pre­vi­ous post, on which we took A first look at de­scrip­tors, it’s time to ex­plore their dif­fer­ent type­s and how they work in­ter­nal­ly.

In Python, almost everything is represented with a dictionary. Objects are dictionaries. Classes are objects, hence they also are contained into a dictionary. This is denoted by the __dict__ attribute that objects have.

There are two types of descriptors: data descriptors and non-data ones. If a descriptor implements both [1] __get__() and __set__(), it’s called a data descriptor; otherwise is a non-data descriptor.

Note

Da­ta de­scrip­tors take prece­dence over the in­stance’s dic­tio­nary of at­tributes, where­as in the case of a non-­da­ta de­scrip­tor, the in­stance’s in­ter­nal dic­tio­nary may be looked up first.

The dif­fer­ence be­tween them, lies on how the prop­er­ties in the ob­ject are ac­cessed, mean­ing which path will the MRO (Method Res­o­lu­tion Or­der) of Python fol­low, in or­der to com­ply with our in­struc­tion.

For a non-­da­ta de­scrip­tor, when we have an state­ment like:

<instance>.<attribute> = <value>

Python will up­date the in­stance’s in­ter­nal dic­tio­nary un­der the key for the ­name of the at­tribute, and store the val­ue in it. This fol­lows the de­fault­ be­haviour of set­ting an at­tribute in an in­stance be­cause there is no __set__ de­fined to over­ride it.

On the oth­er hand, if we have a da­ta de­scrip­tor (al­so called over­rid­ing de­scrip­tor), for the same in­struc­tion the __set__ method will be ran be­cause it’s de­fined. And anal­o­gous­ly, when we ac­cess the prop­er­ty like:

<instance>.<descriptor>

The __get__ on de­scrip­tor is what’s go­ing to be called.

So, again, da­ta (over­rid­ing) de­scrip­tors take prece­dence over the in­ter­nal ­dic­tionary of an ob­jec­t, where­as non da­ta (non-over­rid­ing) ones do not.

Lookup on Non-data Descriptors

On the pre­vi­ous ex­am­ple, when the ob­ject was first cre­at­ed it did­n’t have any val­ues for their prop­er­ties. If we in­spect the ob­jec­t, and its class, we’ll see that it does­n’t have any keys set for 'tv', but the class does:

>>> media.__dict__
{}

>>> media.__class__.__dict__
mappingproxy({'__dict__': <attribute '__dict__' of 'VideoDriver' objects>,
              '__doc__': '...',
              '__module__': '...',
              '__weakref__': ...
              'screen': <Resolution at 0x...>,
              'tv': <Resolution at 0x...>})

When we run me­di­a.tv the first time, there is no key 'tv' on me­di­a.__­dic­t__, so Python tries to search in the class, and founds one, it gets the ob­jec­t, sees that the ob­ject has a __get__, and re­turns what­ev­er that method re­turn­s.

How­ev­er when we set the val­ue like me­di­a.tv = (4096, 2160), there is no __set__ de­fined for the de­scrip­tor, so Python runs with the de­fault­ be­haviour in this case, which is up­dat­ing me­di­a.__­dic­t__. There­fore, nex­t ­time we ask for this at­tribute, it’s go­ing to be found in the in­stance’s ­dic­tionary and re­turned. By anal­o­gy we can see that it does­n’t have a __delete__ method ei­ther, so when the in­struc­tion del me­di­a.tv run­s, this at­tribute will be delet­ed from me­di­a.__­dic­t__, which leaves us back in­ the orig­i­nal sce­nar­i­o, where the de­scrip­tor takes place, act­ing as a de­fault­ ­val­ue hold­er.

Functions are non-data descriptors

This is how methods work in Python: function objects, are non-data descriptors that implement __get__().

If we think about it, ac­cord­ing to ob­jec­t-ori­ent­ed soft­ware the­o­ry, an ob­jec­t is a com­pu­ta­tion­al ab­strac­tion that rep­re­sents an en­ti­ty of the do­main prob­lem. An ob­ject has a set of meth­ods that can work with, which de­ter­mines it­s in­ter­face (what the ob­ject is and can do) [2].

How­ev­er, in more tech­ni­cal terms, ob­jects are just im­ple­ment­ed with a data struc­ture (that in Python are dic­tio­nar­ies), and it’s be­haviour, de­ter­mined ­by their meth­od­s, are just func­tion­s. Again, meth­ods are just func­tion­s. Let’s prove it [3].

If we have a class like this and in­spect its dic­tio­nary we’ll see that what­ev­er we de­fined as meth­od­s, are ac­tu­al­ly func­tions stored in­ter­nal­ly in the ­dic­tionary of the class.

class Person:
    def __init__(self, name):
        self.name = name

    def greet(self, other_person):
        print(f"Hi {other_person.name}, I'm {self.name}!")

We can see that among all the things de­fined in the class, it’s dic­tio­nary ­con­tains an en­try for ‘greet’, whose val­ue is a func­tion.

>>> type(Person.greet)
<class 'function'>

>>> Person.__dict__
mappingproxy({'__dict__': ...
              'greet': <function ...Person.greet>})

This means that in fac­t, it’s the same as hav­ing a func­tion de­fined out­side the ­class, that knows how to work with an in­stance of that same class, which by ­con­ven­tion in Python is called self. There­fore in­side the class, we’re just cre­at­ing func­tions that know how to work with an in­stance of that class, and Python will pro­vide this ob­jec­t, as a first pa­ram­e­ter, un­der the name that we usu­al­ly call self. This is ba­si­cal­ly what the __get__ method does for ­func­tion­s: it re­turns a bound in­stance of the func­tion to that ob­jec­t.

In CPython, this log­ic is im­ple­ment­ed in C, but let’s see if we can cre­ate an equiv­a­lent ex­am­ple, just to get a clear pic­ture. Imag­ine we have a ­cus­tom func­tion, and we want to ap­ply it to a class, as an in­stance method.

First we have an iso­lat­ed func­tion, that com­putes the mean time be­tween ­fail­ures for an ob­ject that col­lects met­rics on sys­tems that mon­i­tors. Then we have a class called Sys­tem­Mon­i­tor, that rep­re­sents all sort of ob­jects that ­col­lect met­rics on mon­i­tored sys­tem­s.

def mtbf(system_monitor):
    """Mean Time Between Failures
    https://en.wikipedia.org/wiki/Mean_time_between_failures
    """
    operational_intervals = zip(
        system_monitor.downtimes,
        system_monitor.uptimes)

    operational_time = sum(
        (start_downtime - start_uptime)
        for start_downtime, start_uptime in operational_intervals)
    try:
        return operational_time / len(system_monitor.downtimes)
    except ZeroDivisionError:
        return 0


class SystemMonitor:
    """Collect metrics on software & hardware components."""
    def __init__(self, name):
        self.name = name
        self.uptimes = []
        self.downtimes = []

    def up(self, when):
        self.uptimes.append(when)

    def down(self, when):
        self.downtimes.append(when)

For now we just test the func­tion, but soon we’ll want this as a method of the ­class. We can eas­i­ly ap­ply the func­tion to work with a Sys­tem­Mon­i­tor in­stance:

>>> monitor = SystemMonitor('prod')
>>> monitor.uptimes = [0,7, 12]
>>> monitor.downtimes = [5, 12]

>>> mtbf(monitor)
>>> 5.0

But now we want it to be part of the class, so that I can use it as a in­stance method. If we try to as­sign the func­tion as a method, it will just fail, be­cause it’s not bound:

>>> monitor.mtbf = mtbf
>>> monitor.mtbf()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-...> in <module>()
----> 1 monitor.mtbf()

TypeError: mtbf() missing 1 required positional argument: 'system_monitor'

In this case the sys­tem_­mon­i­tor po­si­tion­al ar­gu­ment that re­quires, is the in­stance, which in meth­ods is re­ferred to as self.

Now, if the function is bound to the object, the scenario changes. We can do that the same way Python does: __get__.

>>> monitor.mtbf = mtbf.__get__(monitor)
>>> monitor.mtbf()
5.0

Now, we want to be able to define this function inside the class, the same way we do with methods, like def mtbf(self):.... In this case, for simplicity, I’ll just use a callable object, that represents the actual object function (the body of __call__ would represent what we put on the body of the function after it’s definition). And we’ll declare it as an attribute of the class, much like all methods:

class SystemMonitor:
    ...
    mtbf = MTBF()

Pro­vid­ed that MTBF is a callable ob­ject (a­gain, rep­re­sent­ing our “­func­tion”), is equiv­a­lent to do­ing def mtbf(­self): ... in­side the class.

In the body of the callable, we can just reuse the original function, for simplicity. What’s really interesting is the __get__ method, on which we return the callable object, exposed as a method.

class MTBF:
    """Compute Mean Time Between Failures"""
    def __call__(self, instance):
        return mtbf(instance)

    def __get__(self, instance, owner=None):
        return types.MethodType(self, instance)

To ex­plain: the at­tribute mtbf is a “func­tion” (callable ac­tu­al­ly), de­fined in the class. When we call it as a method, Python will see it has a __get__, and when this is called, it will re­turn an­oth­er ob­ject which is the func­tion bound to the in­stance, pass­ing self as first pa­ram­e­ter, which in­ ­turn is what’s go­ing to be ex­e­cut­ed.

This does the trick of mak­ing func­tions work as meth­od­s, which is a very el­e­gant so­lu­tion of CPython.

We can now ap­pre­ci­ate the el­e­gance of the de­sign be­hind meth­od­s: in­stead of cre­at­ing a whole new ob­jec­t, re­use func­tions un­der the as­sump­tion that the ­first pa­ram­e­ter will be an in­stance of that class, that is go­ing to be used in­ter­nal­ly, and by con­ven­tion called self (although, it can be called other­wise).

Fol­low­ing a sim­i­lar log­ic, class­method, and stat­icmethod dec­o­ra­tors, are al­so de­scrip­tors. The for­mer, pass­es the class as the first ar­gu­ment (which is why class meth­ods start with cls as a first ar­gu­men­t), and the lat­ter, sim­ply re­turns the func­tion as it is.

Lookup on Data Descriptors

On the pre­vi­ous ex­am­ple, when we as­signed a val­ue to the prop­er­ty of the de­scrip­tor, the in­stance dic­tio­nary was mod­i­fied be­cause there was no __set__ method on the de­scrip­tor.

For da­ta de­scrip­tors, un­like on the pre­vi­ous ex­am­ple, the meth­ods on the de­scrip­tor ob­ject take prece­dence, mean­ing that the lookup starts by the class, and does­n’t af­fect the in­stance’s dic­tio­nary. This is an asym­me­try, that char­ac­teris­es da­ta de­scrip­tors.

On the pre­vi­ous ex­am­ples, if af­ter run­ning the de­scrip­tor, the __­dic­t__ on the in­stance was mod­i­fied, it was be­cause the code ex­plic­it­ly did so, but it ­could have had a dif­fer­ent log­ic.

class DataDescriptor:
    """This descriptor holds the same values for all instances."""
    def __get__(self, instance, owner):
        return self.value

    def __set__(self, instance, value):
        self.value = value

class Managed:
    descriptor = DataDescriptor()

If we run it, we can see, that since this de­scrip­tor holds the da­ta in­ter­nal­ly, __­dic­t__ is nev­er mod­i­fied on the in­stance [4]:

>>> managed = Managed()
>>> vars(managed)
{}
>>> managed.descriptor = 'foo'
>>> managed.descriptor
'foo'
>>> vars(managed)
{}

>>> managed_2 = Managed()
>>> vars(managed_2)
{}
>>> managed_2.descriptor
'foo'

Method Lookup

The de­scrip­tors ma­chin­ery is trig­gered by __ge­tat­tribute__, so we have to be care­ful if we are over­rid­ing this method (bet­ter not), be­cause if it’s not ­done prop­er­ly, we might pre­vent the de­scrip­tor calls [5]

Warn­ing

Class­es might turn off the de­scrip­tor pro­to­col by over­rid­ing __ge­tat­tribute__.

[1] https://docs.python.org/3.6/howto/descriptor.html#descriptor-protocol
[2] Duck typing
[3] This means that in reality, objects are just data structures with functions on it, much like ADT (Abstract Data Types) in C, or the structs defined in Go with the functions that work over them. A more detailed analysis and explanation of this, deserves a separate post.
[4] This is not a good practice, (except for very particular scenarios that might require it, of course), but it’s shown only to support the idea.
[5] https://docs.python.org/3/howto/descriptor.html#invoking-descriptors