A first look at descriptors

Des­crip­tors are one of the most po­wer­ful fea­tu­res of Py­tho­n. The rea­son wh­y ­the­y’­re so po­wer­ful is be­cau­se they ena­ble us to con­trol the co­re ope­ra­tion­s (­ge­t, se­t, de­le­te) [1], of an attri­bu­te in a gi­ven ob­jec­t, so that we can hook a par­ti­cu­lar co­de, con­tro­lled by us, in or­der to mo­di­fy, chan­ge, or ex­tend the o­ri­gi­nal ope­ra­tio­n.

A descriptor is an object that implements either __get__, __set__, or __delete__.

As of Py­thon 3.6+ [2] the des­crip­tor pro­to­col en­tails the­se me­tho­d­s:

__get__(self, instance, owner)
__set__(self, instance, value)
__delete__(self, instance)
__set_name__(self, instance, name)

We’­ll un­ders­tand be­tter what the pa­ra­me­ters mean, on­ce we’­ve seen so­me exam­ple­s of des­crip­tors and how the­y’­re us­e­d.

How to use them

In or­der to use des­crip­tors we need at least two cla­sses: one for the ­des­crip­tor itsel­f, and the cla­ss that is going to use the des­crip­tor ob­jec­ts (o­ften re­fe­rred to as the ma­na­ged cla­ss).

Getting Data

Con­si­der this ba­sic exam­ple on whi­ch I ha­ve a fic­tio­nal ma­na­ger for vi­deo­ ou­tpu­t, that can hand­le mul­ti­ple de­vi­ce­s. Ea­ch de­vi­ce is set wi­th a par­ti­cu­la­r ­re­so­lu­tio­n, pro­vi­ded by a use­r. Ho­we­ve­r, if for so­me rea­son one of the de­vi­ce­s ­does not ha­ve a ren­de­ring re­so­lu­tion se­t, we want to use a de­fault one, s­pe­ci­fied on the cla­ss de­fi­ni­tio­n.

A po­s­si­ble im­ple­men­ta­tion could look like this.

des­crip­tor­s0_­ge­t0.­py (Sour­ce)

class Resolution:
    """Represents the resolution for a video display. In case there is no
    resolution set, return a default value, previously indicated.
    """
    def __init__(self, attr_name, default_resolution):
        self.attr_name = attr_name
        self.default_resolution = default_resolution

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.default_resolution


class VideoDriver:
    """Contains multiple display devices, each one with a resolution
    configured. If a resolution is not set for a device, return a default one,
    provided by this class, as a fallback.

    >>> media = VideoDriver()
    >>> media.tv
    (1024, 768)
    >>> media.tv = (4096, 2160)
    >>> media.tv
    (4096, 2160)
    >>> del media.tv
    >>> media.tv
    (1024, 768)
    >>> media.screen
    (1920, 1080)
    >>> VideoDriver.tv  # doctest: +ELLIPSIS
    <__main__.Resolution object at 0x...>
    """
    tv = Resolution('tv', (1024, 768))
    screen = Resolution('screen', (1920, 1080))


if __name__ == '__main__':
    import doctest
    doctest.testmod()

In this case resolution is a descriptor that implements only __get__(). If an instance of the display manager, has a resolution set, it will retrieve just that one. On the other hand, if it does not, then when we access one of the class attributes like media.tv, what actually happens is that Python calls:

VideoDriver.tv.__get__(media, VideoDriver)

Which executes the code in the __get__() method of the descriptor, which in this case returns the default value, previously passed.

In ge­ne­ral [4] a co­de like:

<instance>.<descriptor>

Wi­ll be trans­lated to:

type(<instance>).<descriptor>.__get__(<instance>, type(<instance>))

When the des­crip­tor is ca­lled from the cla­ss, and not the ins­tan­ce, the va­lue of the pa­ra­me­ter “ins­tan­ce” is No­ne, but the “o­w­ne­r” is sti­ll a re­fe­ren­ce to­ ­the cla­ss being in­vo­ked (tha­t’s pro­ba­bly one of the rea­sons why the­se are two­ se­pa­ra­te pa­ra­me­ter­s, ins­tead of just let the user de­ri­ve the cla­ss from the ins­tan­ce, it allo­ws even mo­re fle­xi­bi­li­ty).

For this rea­so­n, is co­m­mon to hand­le this ca­se, and re­turn the des­crip­to­r i­tsel­f, whi­ch is the ra­tio­na­le be­hind the li­ne:

if instance is None:
    return self

That is why when you de­fi­ne a pro­per­ty in a cla­ss, and ca­ll it from an ins­tan­ce ob­jec­t, you’­ll get the re­sult of the com­pu­ta­tion of the me­tho­d. Ho­we­ve­r, if ­you ca­ll the pro­per­ty from the cla­ss, you get the pro­per­ty ob­jec­t.

Setting Data

Exam­ple: ima­gi­ne we want to ha­ve so­me attri­bu­tes in an ob­ject that are going to­ ­be tra­ce­d, by other attri­bu­tes that keep tra­ck, of how many ti­mes their va­lues ­chan­ge­d. So, for exam­ple, for eve­ry attri­bu­te <x> on the ob­jec­t, the­re woul­d ­be a co­rres­pon­ding coun­t_<­x> one, that wi­ll keep count of how many ti­mes x chan­ged its va­lue. For sim­pli­ci­ty le­t’s as­su­me attri­bu­tes star­ting wi­th coun­t_<­na­me>, can­not be mo­di­fie­d, and tho­se on­ly co­rres­pond to the count of a­ttri­bu­te <na­me>.

There may be several ways to address this problem. One way could be overriding __setattr__(). Another option, could be by the means of properties (getters and setters) for each attribute we want to track. Or, we can use descriptors.

Both the properties, and __setattr__() approaches, might be subject to code repetition. Their logic should be repeated for several different properties, unless a property function builder is created (in order to reuse the logic of the property across several variables). As per the __setattr__() strategy, if we need to use this logic in multiple classes we would have to come up with some sort of mixin class, in order to achieve it, and if one of the classes already overrides this method, things might get overcomplicated.

The­se two op­tions seem ra­ther con­vo­lute­d. Des­crip­tors it is, then.

des­crip­tor­s0_se­t0.­py (Sour­ce)

class TracedProperty:
    """Keep count of how many times an attribute changed its value"""

    def __set_name__(self, owner, name):
        self.name = name
        self.count_name = f'count_{name}'

    def __set__(self, instance, value):
        try:
            current_value = instance.__dict__[self.name]
        except KeyError:
            instance.__dict__[self.count_name] = 0
        else:
            if current_value != value:
                instance.__dict__[self.count_name] += 1

        instance.__dict__[self.name] = value


class Traveller:
    """
    >>> tourist = Traveller('John Smith')
    >>> tourist.city = 'Barcelona'
    >>> tourist.country = 'Spain'

    >>> tourist.count_city
    0
    >>> tourist.count_country
    0

    >>> tourist.city = 'Stockholm'
    >>> tourist.country = 'Sweden'
    >>> tourist.count_city
    1
    >>> tourist.count_country
    1
    >>> tourist.city = 'Gothenburg'
    >>> tourist.count_city
    2
    >>> tourist.count_country
    1
    >>> tourist.country = 'Sweden'
    >>> tourist.count_country
    1
    """
    city = TracedProperty()
    country = TracedProperty()

    def __init__(self, name):
        self.name = name


if __name__ == '__main__':
    import doctest
    doctest.testmod()

The do­cs­tring on the Tra­ve­ller cla­ss, pre­tty mu­ch ex­plains its in­ten­de­d u­se. The im­por­tant thing about this, is the pu­blic in­ter­fa­ce: it’s ab­so­lu­te­l­y ­trans­pa­rent for the use­r. An ob­ject that in­te­rac­ts wi­th a Tra­ve­ller ins­tan­ce, ge­ts a clean in­ter­fa­ce, wi­th the pro­per­ties ex­po­se­d, wi­thout ha­vin­g ­to wo­rry about the un­der­l­ying im­ple­men­ta­tio­n.

So, we ha­ve two cla­sses, wi­th di­ffe­rent res­pon­si­bi­li­tie­s, but re­late­d, be­cau­se ­they in­te­ract to­war­ds a co­m­mon goa­l. Tra­ve­ller has two cla­ss attri­bu­tes tha­t, are ob­jec­ts, ins­tan­ces of the des­crip­to­r.

Now le­t’s take a look at the other si­de of it, the in­ter­nal wo­rking of the ­des­crip­to­r.

Un­der this sche­ma, Py­thon wi­ll trans­la­te a ca­ll like:

traveller = Traveller()
traveller.city = 'Stockholm'

To the one using the __set__ method in the descriptor, like:

Traveller.city.__set__(traveller, 'Stockholm')

Whi­ch means that the __se­t__ me­thod on the des­crip­tor is going to re­cei­ve ­the ins­tan­ce of the ob­ject being ac­ce­ss­e­d, as a first pa­ra­me­te­r, and then the ­va­lue that is being as­sig­ne­d.

Mo­re ge­ne­ra­lly we could say that so­me­thing like:

obj.<descriptor> = <value>

Trans­la­tes to:

type(obj).__set__(obj, <value>)

Wi­th the­se two pa­ra­me­ter­s, we can ma­ni­pu­la­te the in­te­rac­tion any way we wan­t, whi­ch makes the pro­to­col rea­lly po­wer­fu­l.

In this example, we are taking advantage of this, by querying the original object’s attribute dictionary (instance.__dict__), and getting the value in order to compare it with the newly received one. By reading this value, we calculate another attribute which will hold the count of the number of times the attribute was modified, and then, both of them are updated in the original dictionary for the instance.

An im­por­tant con­cept to point out is that this im­ple­men­ta­tion not on­ly wo­rks, ­but it al­so sol­ves the pro­blem in a mo­re ge­ne­ric fas­hio­n. In this exam­ple, it was the ca­se of a tra­ve­lle­r, of whom we wanted to know how many ti­mes chan­ge­d of lo­ca­tio­n, but the exact sa­me ob­ject could be us­ed for exam­ple to mo­ni­to­r ­ma­rket sto­cks, va­ria­bles in an equa­tio­n, etc. This ex­po­ses func­tio­na­li­ty as a ­sort of li­bra­r­y, toolki­t, or even fra­mewo­rk. In fac­t, many we­ll-k­no­wn ­fra­mewo­rks in Py­thon use des­crip­tors to ex­po­se their API.

Deleting Data

The __delete__() method is going to be called when an instruction of the type del <instance>.<descriptor> is executed. See the following example.

des­crip­tor­s0_­de­le­te0.­py (Sour­ce)

"""An example of a descriptor with a ``__delete__()`` method.
The code is for illustration purposes only, and it does not correspond to any
actual implementation.
"""


class ProtectedAttribute:
    """A class attribute that can be protected against deletion"""

    def __set_name__(self, owner, name):
        self.name = name

    def __set__(self, instance, value):
        instance.__dict__[self.name] = value

    def __delete__(self, instance):
        raise AttributeError(f"Can't delete {self.name} for {instance!s}")


class ProtectedUser:
    """
    >>> usr = ProtectedUser('jsmith', '127.0.0.1')
    >>> usr.username
    'jsmith'
    >>> del usr.username
    Traceback (most recent call last):
    ...
    AttributeError: Can't delete username for ProtectedUser[jsmith]
    >>> usr.location
    '127.0.0.1'
    >>> del usr.location
    >>> usr.location
    Traceback (most recent call last):
    ...
    AttributeError: 'ProtectedUser' object has no attribute 'location'
    """
    username = ProtectedAttribute()

    def __init__(self, username, location):
        self.username = username
        self.location = location

    def __str__(self):
        return f"{self.__class__.__name__}[{self.username}]"


if __name__ == '__main__':
    import doctest
    doctest.testmod()

In this exam­ple, we just want a pro­per­ty in the ob­jec­t, that can­not be de­le­te­d, and des­crip­tor­s, agai­n, pro­vi­de one of the mul­ti­ple po­s­si­ble im­ple­men­ta­tion­s.

Caveats and recommendations

  • Re­mem­ber that des­crip­tors should alwa­ys be us­ed as cla­ss attri­bu­tes.
  • Data should be stored in each original managed instance, instead of doing data bookkeeping in the descriptor. Each object should have its data in its __dict__.
  • Preserve the ability of accessing the descriptor from the class as well, not only from instances. Mind the case when instance is None, so it can be called as type(instance).descriptor.
  • Do not override __getattribute__(), or they might lose effect.
  • Mind the di­ffe­ren­ce be­tween da­ta and no­n-­da­ta des­crip­tors [3].
  • Im­ple­ment the mi­ni­mum re­qui­red in­ter­fa­ce.

Food for thought

Des­crip­tors pro­vi­de a fra­mewo­rk for abs­trac­ting away re­pe­ti­ti­ve ac­ce­ss lo­gi­c. ­The term fra­mewo­rk he­re is not a coin­ci­den­ce. As the rea­der mi­ght ha­ve ­no­ti­ce­d, by using des­crip­tor­s, the­re is an in­ver­sion of con­trol (IoC) on ­the co­de, be­cau­se Py­thon wi­ll be ca­lling the lo­gic we put un­der the des­crip­to­r ­me­tho­d­s, when ac­ce­s­sing the­se attri­bu­tes from the ma­na­ged ins­tan­ce.

Un­der this con­si­de­ra­tions it is co­rrect to thi­nk that it be­ha­ves as a ­fra­mewo­rk.

Summary

Des­crip­tors pro­vi­de an API, to con­trol the co­re ac­ce­ss to an ob­jec­t’s da­ta ­mo­de­l, at its lo­w-­le­vel ope­ra­tion­s. By means of des­crip­tors we can con­trol the exe­cu­tion of an ob­jec­t’s in­ter­fa­ce, be­cau­se they pro­vi­de a trans­pa­rent la­ye­r ­be­tween the pu­blic in­ter­fa­ce (what is ex­po­sed to user­s), and the in­ter­na­l ­re­pre­sen­ta­tion and sto­ra­ge of da­ta.

They are one of the most powerful features of Python, and their possibilities are virtually unlimited, so in this post we’ve only scratched the surface of them. More details, such as exploring the different types of descriptors with their internal representation or data, the use of the new __set_name__ magic method, their relation with decorators, and analysis of good implementations, are some of the topics for future entries.

[1] Python Cookbook (3rd edition) - David Beazley & Brian K. Jones
[2] https://docs.python.org/3.6/reference/datamodel.html#descriptors
[3] More details about this, will come in a future post.
[4] https://docs.python.org/3.6/howto/descriptor.html#invoking-descriptors