Types of Descriptors

Re­su­ming from whe­re we le­ft off, on the pre­vious pos­t, on whi­ch we took A first look at des­crip­tors, it’s ti­me to ex­plo­re their di­ffe­rent ty­pes and how they wo­rk in­ter­na­ll­y.

In Python, almost everything is represented with a dictionary. Objects are dictionaries. Classes are objects, hence they also are contained into a dictionary. This is denoted by the __dict__ attribute that objects have.

There are two types of descriptors: data descriptors and non-data ones. If a descriptor implements both [1] __get__() and __set__(), it’s called a data descriptor; otherwise is a non-data descriptor.

No­te

Da­ta des­crip­tors take pre­ce­den­ce over the ins­tan­ce’s dic­tio­na­ry of a­ttri­bu­tes, whe­reas in the ca­se of a no­n-­da­ta des­crip­to­r, the ins­tan­ce’s in­ter­nal dic­tio­na­ry may be looked up firs­t.

The di­ffe­ren­ce be­tween the­m, lies on how the pro­per­ties in the ob­ject are ac­ce­ss­e­d, mea­ning whi­ch pa­th wi­ll the MRO (Me­thod Re­so­lu­tion Or­de­r) of ­P­y­thon fo­llo­w, in or­der to com­ply wi­th our ins­truc­tio­n.

For a no­n-­da­ta des­crip­to­r, when we ha­ve an sta­te­ment like:

<instance>.<attribute> = <value>

Py­thon wi­ll up­da­te the ins­tan­ce’s in­ter­nal dic­tio­na­ry un­der the key for the ­na­me of the attri­bu­te, and sto­re the va­lue in it. This fo­llo­ws the de­faul­t ­be­ha­viour of se­tting an attri­bu­te in an ins­tan­ce be­cau­se the­re is no __se­t__ de­fi­ned to ove­rri­de it.

On the other han­d, if we ha­ve a da­ta des­crip­tor (al­so ca­lled ove­rri­ding des­crip­to­r), for the sa­me ins­truc­tion the __se­t__ me­thod wi­ll be ran ­be­cau­se it’s de­fi­ne­d. And ana­lo­gous­l­y, when we ac­ce­ss the pro­per­ty like:

<instance>.<descriptor>

The __­ge­t__ on des­crip­tor is wha­t’s going to be ca­lle­d.

So, agai­n, da­ta (o­ve­rri­din­g) des­crip­tors take pre­ce­den­ce over the in­ter­na­l ­dic­tio­na­ry of an ob­jec­t, whe­reas non da­ta (no­n-o­ve­rri­din­g) ones do no­t.

Lookup on Non-data Descriptors

On the pre­vious exam­ple, when ­the ob­ject was first created it did­n’t ha­ve any va­lues for their pro­per­tie­s. If we ins­pect the ob­jec­t, and its cla­ss, we’­ll see that it does­n’t ha­ve any ke­ys set for 'tv', but the cla­ss does:

>>> media.__dict__
{}

>>> media.__class__.__dict__
mappingproxy({'__dict__': <attribute '__dict__' of 'VideoDriver' objects>,
              '__doc__': '...',
              '__module__': '...',
              '__weakref__': ...
              'screen': <Resolution at 0x...>,
              'tv': <Resolution at 0x...>})

When we run me­dia.­tv the first ti­me, the­re is no key 'tv' on me­dia.__­dic­t__, so Py­thon tries to sear­ch in the cla­ss, and foun­ds one, it ­ge­ts the ob­jec­t, sees that the ob­ject has a __­ge­t__, and re­turns whate­ve­r ­that me­thod re­turn­s.

Ho­we­ver when we set the va­lue like me­dia.­tv = (4096, 2160), the­re is no __se­t__ de­fi­ned for the des­crip­to­r, so Py­thon runs wi­th the de­faul­t ­be­ha­viour in this ca­se, whi­ch is up­da­ting me­dia.__­dic­t__. The­re­fo­re, nex­t ­ti­me we ask for this attri­bu­te, it’s going to be found in the ins­tan­ce’s ­dic­tio­na­ry and re­tur­ne­d. By ana­lo­gy we can see that it does­n’t ha­ve a __­de­le­te__ me­thod ei­the­r, so when the ins­truc­tion del me­dia.­tv run­s, ­this attri­bu­te wi­ll be de­le­ted from me­dia.__­dic­t__, whi­ch lea­ves us ba­ck in ­the ori­gi­nal sce­na­rio, whe­re the des­crip­tor takes pla­ce, ac­ting as a de­faul­t ­va­lue hol­de­r.

Functions are non-data descriptors

This is how methods work in Python: function objects, are non-data descriptors that implement __get__().

If we thi­nk about it, ac­cor­ding to ob­jec­t-o­rien­ted so­ftwa­re theo­r­y, an ob­jec­t is a com­pu­ta­tio­nal abs­trac­tion that re­pre­sen­ts an en­ti­ty of the do­main pro­ble­m. An ob­ject has a set of me­tho­ds that can wo­rk wi­th, whi­ch de­ter­mi­nes its in­ter­fa­ce (what the ob­ject is and can do) [2].

Ho­we­ve­r, in mo­re te­ch­ni­cal ter­ms, ob­jec­ts are just im­ple­men­ted wi­th a da­ta s­truc­tu­re (that in Py­thon are dic­tio­na­rie­s), and it’s be­ha­viou­r, de­ter­mi­ne­d by their me­tho­d­s, are just func­tion­s. Agai­n, me­tho­ds are just func­tion­s. Le­t’s ­pro­ve it [3].

If we ha­ve a cla­ss like this and ins­pect its dic­tio­na­ry we’­ll see that whate­ve­r we de­fi­ned as me­tho­d­s, are ac­tua­lly func­tions sto­red in­ter­na­lly in the ­dic­tio­na­ry of the cla­ss.

class Person:
    def __init__(self, name):
        self.name = name

    def greet(self, other_person):
        print(f"Hi {other_person.name}, I'm {self.name}!")

We can see that among all the things de­fi­ned in the cla­ss, it’s dic­tio­na­r­y ­con­tains an en­try for ‘gree­t’, who­se va­lue is a func­tio­n.

>>> type(Person.greet)
<class 'function'>

>>> Person.__dict__
mappingproxy({'__dict__': ...
              'greet': <function ...Person.greet>})

This means that in fac­t, it’s the sa­me as ha­ving a func­tion de­fi­ned ou­tsi­de the ­cla­ss, that kno­ws how to wo­rk wi­th an ins­tan­ce of that sa­me cla­ss, whi­ch by ­con­ven­tion in Py­thon is ca­lled se­lf. The­re­fo­re in­si­de the cla­ss, we’­re jus­t ­crea­ting func­tions that know how to wo­rk wi­th an ins­tan­ce of that cla­ss, an­d ­P­y­thon wi­ll pro­vi­de this ob­jec­t, as a first pa­ra­me­te­r, un­der the na­me that we u­sua­lly ca­ll se­lf. This is ba­si­ca­lly what the __­ge­t__ me­thod does fo­r ­func­tion­s: it re­turns a bound ins­tan­ce of the func­tion to that ob­jec­t.

In CP­y­thon, this lo­gic is im­ple­men­ted in C, but le­t’s see if we can ­crea­te an equi­va­lent exam­ple, just to get a clear pic­tu­re. Ima­gi­ne we ha­ve a ­cus­tom func­tio­n, and we want to apply it to a cla­ss, as an ins­tan­ce me­tho­d.

First we ha­ve an iso­lated func­tio­n, that com­pu­tes the mean ti­me be­tween ­fai­lu­res for an ob­ject that co­llec­ts me­tri­cs on sys­te­ms that mo­ni­tor­s. Then we ha­ve a cla­ss ca­lled Sys­te­m­Mo­ni­tor, that re­pre­sen­ts all sort of ob­jec­ts tha­t ­co­llect me­tri­cs on mo­ni­to­red sys­te­ms.

def mtbf(system_monitor):
    """Mean Time Between Failures
    https://en.wikipedia.org/wiki/Mean_time_between_failures
    """
    operational_intervals = zip(
        system_monitor.downtimes,
        system_monitor.uptimes)

    operational_time = sum(
        (start_downtime - start_uptime)
        for start_downtime, start_uptime in operational_intervals)
    try:
        return operational_time / len(system_monitor.downtimes)
    except ZeroDivisionError:
        return 0


class SystemMonitor:
    """Collect metrics on software & hardware components."""
    def __init__(self, name):
        self.name = name
        self.uptimes = []
        self.downtimes = []

    def up(self, when):
        self.uptimes.append(when)

    def down(self, when):
        self.downtimes.append(when)

For now we just test the func­tio­n, but soon we’­ll want this as a me­thod of the ­cla­ss. We can ea­si­ly apply the func­tion to wo­rk wi­th a Sys­te­m­Mo­ni­tor ins­tan­ce:

>>> monitor = SystemMonitor('prod')
>>> monitor.uptimes = [0,7, 12]
>>> monitor.downtimes = [5, 12]

>>> mtbf(monitor)
>>> 5.0

But now we want it to be part of the cla­ss, so that I can use it as a ins­tan­ce ­me­tho­d. If we try to as­sign the func­tion as a me­tho­d, it wi­ll just fai­l, ­be­cau­se it’s not boun­d:

>>> monitor.mtbf = mtbf
>>> monitor.mtbf()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-...> in <module>()
----> 1 monitor.mtbf()

TypeError: mtbf() missing 1 required positional argument: 'system_monitor'

In this ca­se the sys­te­m_­mo­ni­tor po­si­tio­nal ar­gu­ment that re­qui­res, is the ins­tan­ce, whi­ch in me­tho­ds is re­fe­rred to as se­lf.

Now, if the function is bound to the object, the scenario changes. We can do that the same way Python does: __get__.

>>> monitor.mtbf = mtbf.__get__(monitor)
>>> monitor.mtbf()
5.0

Now, we want to be able to define this function inside the class, the same way we do with methods, like def mtbf(self):.... In this case, for simplicity, I’ll just use a callable object, that represents the actual object function (the body of __call__ would represent what we put on the body of the function after it’s definition). And we’ll declare it as an attribute of the class, much like all methods:

class SystemMonitor:
    ...
    mtbf = MTBF()

Pro­vi­ded that MTBF is a ca­lla­ble ob­ject (a­gai­n, re­pre­sen­ting ou­r “­func­tio­n”), is equi­va­lent to doing def mtbf(sel­f): ... in­si­de the cla­ss.

In the body of the callable, we can just reuse the original function, for simplicity. What’s really interesting is the __get__ method, on which we return the callable object, exposed as a method.

class MTBF:
    """Compute Mean Time Between Failures"""
    def __call__(self, instance):
        return mtbf(instance)

    def __get__(self, instance, owner=None):
        return types.MethodType(self, instance)

To ex­plai­n: the attri­bu­te mtbf is a “func­tio­n” (ca­lla­ble ac­tua­ll­y), de­fi­ne­d in the cla­ss. When we ca­ll it as a me­tho­d, Py­thon wi­ll see it has a __­ge­t__, and when this is ca­lle­d, it wi­ll re­turn ano­ther ob­ject whi­ch is ­the func­tion bound to the ins­tan­ce, pa­s­sing se­lf as first pa­ra­me­te­r, whi­ch in ­turn is wha­t’s going to be exe­cute­d.

This does the tri­ck of making func­tions wo­rk as me­tho­d­s, whi­ch is a ve­r­y e­le­gant so­lu­tion of CP­y­thon.

We can now appre­cia­te the ele­gan­ce of the de­sign be­hind me­tho­d­s: ins­tead of ­crea­ting a who­le new ob­jec­t, reu­se func­tions un­der the as­sump­tion that the ­first pa­ra­me­ter wi­ll be an ins­tan­ce of that cla­ss, that is going to be us­e­d in­ter­na­ll­y, and by con­ven­tion ca­lled se­lf (al­thou­gh, it can be ca­lle­d o­the­rwi­se).

Fo­llo­wing a si­mi­lar lo­gi­c, cla­ss­me­thod, and sta­ti­c­me­thod de­co­ra­tor­s, a­re al­so des­crip­tor­s. The for­me­r, pa­s­ses the cla­ss as the first ar­gu­ment (whi­ch is why cla­ss me­tho­ds start wi­th cls as a first ar­gu­men­t), and the la­tte­r, ­sim­ply re­turns the func­tion as it is.

Lookup on Data Descriptors

On the pre­vious exam­ple, when we as­sig­ned a va­lue to the pro­per­ty of the ­des­crip­to­r, the ins­tan­ce dic­tio­na­ry was mo­di­fied be­cau­se the­re was no __se­t__ me­thod on the des­crip­to­r.

For da­ta des­crip­tor­s, un­like on the pre­vious exam­ple, the me­tho­ds on the ­des­crip­tor ob­ject take pre­ce­den­ce, mea­ning that the lookup star­ts by the cla­ss, and does­n’t affect the ins­tan­ce’s dic­tio­na­r­y. This is an as­y­m­me­tr­y, tha­t ­cha­rac­te­ri­ses da­ta des­crip­tor­s.

On the pre­vious exam­ple­s, if after run­ning the des­crip­to­r, the __­dic­t__ on ­the ins­tan­ce was mo­di­fie­d, it was be­cau­se the co­de ex­pli­ci­tly did so, but it ­could ha­ve had a di­ffe­rent lo­gi­c.

class DataDescriptor:
    """This descriptor holds the same values for all instances."""
    def __get__(self, instance, owner):
        return self.value

    def __set__(self, instance, value):
        self.value = value

class Managed:
    descriptor = DataDescriptor()

If we run it, we can see, that sin­ce this des­crip­tor hol­ds the da­ta in­ter­na­ll­y, __­dic­t__ is ne­ver mo­di­fied on the ins­tan­ce [4]:

>>> managed = Managed()
>>> vars(managed)
{}
>>> managed.descriptor = 'foo'
>>> managed.descriptor
'foo'
>>> vars(managed)
{}

>>> managed_2 = Managed()
>>> vars(managed_2)
{}
>>> managed_2.descriptor
'foo'

Method Lookup

The des­crip­tors ma­chi­ne­ry is tri­gge­red by __­ge­ta­ttri­bu­te__, so we ha­ve to­ ­be ca­re­ful if we are ove­rri­ding this me­thod (be­tter no­t), be­cau­se if it’s no­t ­do­ne pro­per­l­y, we mi­ght pre­vent the des­crip­tor ca­lls [5]

War­ning

Cla­s­ses mi­ght turn off the des­crip­tor pro­to­col by ove­rri­din­g __­ge­ta­ttri­bu­te__.

[1] https://docs.python.org/3.6/howto/descriptor.html#descriptor-protocol
[2] Duck typing
[3] This means that in reality, objects are just data structures with functions on it, much like ADT (Abstract Data Types) in C, or the structs defined in Go with the functions that work over them. A more detailed analysis and explanation of this, deserves a separate post.
[4] This is not a good practice, (except for very particular scenarios that might require it, of course), but it’s shown only to support the idea.
[5] https://docs.python.org/3/howto/descriptor.html#invoking-descriptors