Types of Descriptors

Re­su­ming from whe­re we le­ft off, on the pre­vious pos­t, on whi­ch we took A first look at des­crip­tors, it’s ti­me to ex­plo­re their di­ffe­rent ty­pes and how they wo­rk in­ter­na­ll­y.

In Python, almost everything is represented with a dictionary. Objects are dictionaries. Classes are objects, hence they also are contained into a dictionary. This is denoted by the __dict__ attribute that objects have.

There are two types of descriptors: data descriptors and non-data ones. If a descriptor implements both 1 __get__() and __set__(), it’s called a data descriptor; otherwise is a non-data descriptor.

No­te

Da­ta des­crip­tors take pre­ce­den­ce over the ins­tan­ce’s dic­tio­na­ry of a­ttri­bu­tes, whe­reas in the ca­se of a no­n-­da­ta des­crip­to­r, the ins­tan­ce’s in­ter­nal dic­tio­na­ry may be looked up firs­t.

The difference between them, lies on how the properties in the object are accessed, meaning which path will the MRO (Method Resolution Order) of Python follow, in order to comply with our instruction.

For a no­n-­da­ta des­crip­to­r, when we ha­ve an sta­te­ment like:

<instance>.<attribute> = <value>

Python will update the instance’s internal dictionary under the key for the name of the attribute, and store the value in it. This follows the default behaviour of setting an attribute in an instance because there is no __set__ defined to override it.

On the other hand, if we have a data descriptor (also called overriding descriptor), for the same instruction the __set__ method will be ran because it’s defined. And analogously, when we access the property like:

<instance>.<descriptor>

The __get__ on descriptor is what’s going to be called.

So, agai­n, da­ta (o­ve­rri­din­g) des­crip­tors take pre­ce­den­ce over the in­ter­na­l ­dic­tio­na­ry of an ob­jec­t, whe­reas non da­ta (no­n-o­ve­rri­din­g) ones do no­t.

Lookup on Non-data Descriptors

On the previous example, when the object was first created it didn’t have any values for their properties. If we inspect the object, and its class, we’ll see that it doesn’t have any keys set for 'tv', but the class does:

>>> media.__dict__
{}

>>> media.__class__.__dict__
mappingproxy({'__dict__': <attribute '__dict__' of 'VideoDriver' objects>,
              '__doc__': '...',
              '__module__': '...',
              '__weakref__': ...
              'screen': <Resolution at 0x...>,
              'tv': <Resolution at 0x...>})

When we run media.tv the first time, there is no key 'tv' on media.__dict__, so Python tries to search in the class, and founds one, it gets the object, sees that the object has a __get__, and returns whatever that method returns.

However when we set the value like media.tv = (4096, 2160), there is no __set__ defined for the descriptor, so Python runs with the default behaviour in this case, which is updating media.__dict__. Therefore, next time we ask for this attribute, it’s going to be found in the instance’s dictionary and returned. By analogy we can see that it doesn’t have a __delete__ method either, so when the instruction del media.tv runs, this attribute will be deleted from media.__dict__, which leaves us back in the original scenario, where the descriptor takes place, acting as a default value holder.

Functions are non-data descriptors

This is how methods work in Python: function objects, are non-data descriptors that implement __get__().

If we thi­nk about it, ac­cor­ding to ob­jec­t-o­rien­ted so­ftwa­re theo­r­y, an ob­jec­t is a com­pu­ta­tio­nal abs­trac­tion that re­pre­sen­ts an en­ti­ty of the do­main pro­ble­m. An ob­ject has a set of me­tho­ds that can wo­rk wi­th, whi­ch de­ter­mi­nes its in­ter­fa­ce (what the ob­ject is and can do) 2.

Ho­we­ve­r, in mo­re te­ch­ni­cal ter­ms, ob­jec­ts are just im­ple­men­ted wi­th a da­ta s­truc­tu­re (that in Py­thon are dic­tio­na­rie­s), and it’s be­ha­viou­r, de­ter­mi­ne­d by their me­tho­d­s, are just func­tion­s. Agai­n, me­tho­ds are just func­tion­s. Le­t’s ­pro­ve it 3.

If we ha­ve a cla­ss like this and ins­pect its dic­tio­na­ry we’­ll see that whate­ve­r we de­fi­ned as me­tho­d­s, are ac­tua­lly func­tions sto­red in­ter­na­lly in the ­dic­tio­na­ry of the cla­ss.

class Person:
    def __init__(self, name):
        self.name = name

    def greet(self, other_person):
        print(f"Hi {other_person.name}, I'm {self.name}!")

We can see that among all the things de­fi­ned in the cla­ss, it’s dic­tio­na­r­y ­con­tains an en­try for ‘gree­t’, who­se va­lue is a func­tio­n.

>>> type(Person.greet)
<class 'function'>

>>> Person.__dict__
mappingproxy({'__dict__': ...
              'greet': <function ...Person.greet>})

This means that in fact, it’s the same as having a function defined outside the class, that knows how to work with an instance of that same class, which by convention in Python is called self. Therefore inside the class, we’re just creating functions that know how to work with an instance of that class, and Python will provide this object, as a first parameter, under the name that we usually call self. This is basically what the __get__ method does for functions: it returns a bound instance of the function to that object.

In CPython, this logic is implemented in C, but let’s see if we can create an equivalent example, just to get a clear picture. Imagine we have a custom function, and we want to apply it to a class, as an instance method.

First we have an isolated function, that computes the mean time between failures for an object that collects metrics on systems that monitors. Then we have a class called SystemMonitor, that represents all sort of objects that collect metrics on monitored systems.

def mtbf(system_monitor):
    """Mean Time Between Failures
    https://en.wikipedia.org/wiki/Mean_time_between_failures
    """
    operational_intervals = zip(
        system_monitor.downtimes,
        system_monitor.uptimes)

    operational_time = sum(
        (start_downtime - start_uptime)
        for start_downtime, start_uptime in operational_intervals)
    try:
        return operational_time / len(system_monitor.downtimes)
    except ZeroDivisionError:
        return 0


class SystemMonitor:
    """Collect metrics on software & hardware components."""
    def __init__(self, name):
        self.name = name
        self.uptimes = []
        self.downtimes = []

    def up(self, when):
        self.uptimes.append(when)

    def down(self, when):
        self.downtimes.append(when)

For now we just test the function, but soon we’ll want this as a method of the class. We can easily apply the function to work with a SystemMonitor instance:

>>> monitor = SystemMonitor('prod')
>>> monitor.uptimes = [0,7, 12]
>>> monitor.downtimes = [5, 12]

>>> mtbf(monitor)
>>> 5.0

But now we want it to be part of the cla­ss, so that I can use it as a ins­tan­ce ­me­tho­d. If we try to as­sign the func­tion as a me­tho­d, it wi­ll just fai­l, ­be­cau­se it’s not boun­d:

>>> monitor.mtbf = mtbf
>>> monitor.mtbf()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-...> in <module>()
----> 1 monitor.mtbf()

TypeError: mtbf() missing 1 required positional argument: 'system_monitor'

In this case the system_monitor positional argument that requires, is the instance, which in methods is referred to as self.

Now, if the function is bound to the object, the scenario changes. We can do that the same way Python does: __get__.

>>> monitor.mtbf = mtbf.__get__(monitor)
>>> monitor.mtbf()
5.0

Now, we want to be able to define this function inside the class, the same way we do with methods, like def mtbf(self):.... In this case, for simplicity, I’ll just use a callable object, that represents the actual object function (the body of __call__ would represent what we put on the body of the function after it’s definition). And we’ll declare it as an attribute of the class, much like all methods:

class SystemMonitor:
    ...
    mtbf = MTBF()

Provided that MTBF is a callable object (again, representing our “function”), is equivalent to doing def mtbf(self): ... inside the class.

In the body of the callable, we can just reuse the original function, for simplicity. What’s really interesting is the __get__ method, on which we return the callable object, exposed as a method.

class MTBF:
    """Compute Mean Time Between Failures"""
    def __call__(self, instance):
        return mtbf(instance)

    def __get__(self, instance, owner=None):
        return types.MethodType(self, instance)

To explain: the attribute mtbf is a “function” (callable actually), defined in the class. When we call it as a method, Python will see it has a __get__, and when this is called, it will return another object which is the function bound to the instance, passing self as first parameter, which in turn is what’s going to be executed.

This does the trick of making functions work as methods, which is a very elegant solution of CPython.

We can now appre­cia­te the ele­gan­ce of the de­sign be­hind me­tho­d­s: ins­tead of ­crea­ting a who­le new ob­jec­t, reu­se func­tions un­der the as­sump­tion that the ­first pa­ra­me­ter wi­ll be an ins­tan­ce of that cla­ss, that is going to be us­e­d in­ter­na­ll­y, and by con­ven­tion ca­lled se­lf (al­thou­gh, it can be ca­lle­d o­the­rwi­se).

Following a similar logic, classmethod, and staticmethod decorators, are also descriptors. The former, passes the class as the first argument (which is why class methods start with cls as a first argument), and the latter, simply returns the function as it is.

Lookup on Data Descriptors

On the previous example, when we assigned a value to the property of the descriptor, the instance dictionary was modified because there was no __set__ method on the descriptor.

For da­ta des­crip­tor­s, un­like on the pre­vious exam­ple, the me­tho­ds on the ­des­crip­tor ob­ject take pre­ce­den­ce, mea­ning that the lookup star­ts by the cla­ss, and does­n’t affect the ins­tan­ce’s dic­tio­na­r­y. This is an as­y­m­me­tr­y, tha­t ­cha­rac­te­ri­ses da­ta des­crip­tor­s.

On the previous examples, if after running the descriptor, the __dict__ on the instance was modified, it was because the code explicitly did so, but it could have had a different logic.

class DataDescriptor:
    """This descriptor holds the same values for all instances."""
    def __get__(self, instance, owner):
        return self.value

    def __set__(self, instance, value):
        self.value = value

class Managed:
    descriptor = DataDescriptor()

If we run it, we can see, that since this descriptor holds the data internally, __dict__ is never modified on the instance 4:

>>> managed = Managed()
>>> vars(managed)
{}
>>> managed.descriptor = 'foo'
>>> managed.descriptor
'foo'
>>> vars(managed)
{}

>>> managed_2 = Managed()
>>> vars(managed_2)
{}
>>> managed_2.descriptor
'foo'

Method Lookup

The descriptors machinery is triggered by __getattribute__, so we have to be careful if we are overriding this method (better not), because if it’s not done properly, we might prevent the descriptor calls 5

War­ning

Classes might turn off the descriptor protocol by overriding __getattribute__.

1

http­s://­do­cs.­p­y­tho­n.or­g/3.6/ho­w­to­/­des­crip­to­r.ht­m­l#­des­crip­to­r-­pro­to­col

2

Du­ck ty­ping

3

This means that in rea­li­ty, ob­jec­ts are just da­ta struc­tu­res wi­th ­func­tions on it, mu­ch like ADT (A­bs­tract Da­ta Ty­pes) in C, or the s­truc­ts de­fi­ned in Go wi­th the func­tions that wo­rk over the­m. A mo­re ­de­tai­led ana­l­y­sis and ex­pla­na­tion of this, de­ser­ves a se­pa­ra­te pos­t.

4

This is not a good prac­ti­ce, (ex­cept for ve­ry par­ti­cu­lar sce­na­rios tha­t ­mi­ght re­qui­re it, of cour­se), but it’s sho­wn on­ly to su­pport the idea.

5

http­s://­do­cs.­p­y­tho­n.or­g/3/ho­w­to­/­des­crip­to­r.ht­m­l#in­vo­kin­g-­des­crip­tors