A first look at descriptors

De­scrip­tors are one of the most pow­er­ful fea­tures of Python. The rea­son why they’re so pow­er­ful is be­cause they en­able us to con­trol the core op­er­a­tions (get, set, delete) 1, of an at­tribute in a giv­en ob­jec­t, so that we can hook a par­tic­u­lar code, con­trolled by us, in or­der to mod­i­fy, change, or ex­tend the orig­i­nal op­er­a­tion.

A descriptor is an object that implements either __get__, __set__, or __delete__.

As of Python 3.6+ 2 the descriptor protocol entails these methods:

__get__(self, instance, owner)
__set__(self, instance, value)
__delete__(self, instance)
__set_name__(self, instance, name)

We’ll un­der­stand bet­ter what the pa­ram­e­ters mean, once we’ve seen some ex­am­ples of de­scrip­tors and how they’re used.

How to use them

In or­der to use de­scrip­tors we need at least two class­es: one for the de­scrip­tor it­self, and the class that is go­ing to use the de­scrip­tor ob­jects (often re­ferred to as the man­aged class).

Getting Data

Con­sid­er this ba­sic ex­am­ple on which I have a fic­tion­al man­ag­er for video out­put, that can han­dle mul­ti­ple de­vices. Each de­vice is set with a par­tic­u­lar res­o­lu­tion, pro­vid­ed by a us­er. How­ev­er, if for some rea­son one of the de­vices does not have a ren­der­ing res­o­lu­tion set, we want to use a de­fault one, spec­i­fied on the class def­i­ni­tion.

A pos­si­ble im­ple­men­ta­tion could look like this.

de­scrip­tors0_get0.py (Source)

class Res­o­lu­tion:
    """Rep­re­sents the res­o­lu­tion for a video dis­play. In case there is no
    res­o­lu­tion set, re­turn a de­fault val­ue, pre­vi­ous­ly in­di­cat­ed.
    """
    def __init__(self, at­tr_­name, de­fault­_res­o­lu­tion):
        self.at­tr_­name = at­tr_­name
        self.de­fault­_res­o­lu­tion = de­fault­_res­o­lu­tion
    def __get__(self, in­stance, own­er):
        if in­stance is None:
            re­turn self
        re­turn self.de­fault­_res­o­lu­tion
class Video­Driv­er:
    """­Con­tains mul­ti­ple dis­play de­vices, each one with a res­o­lu­tion
    ­con­fig­ured. If a res­o­lu­tion is not set for a de­vice, re­turn a de­fault one,
    pro­vid­ed by this class, as a fall­back.
    >>> me­dia = Video­Driver()
    >>> me­di­a.tv
    (1024, 768)
    >>> me­di­a.tv = (4096, 2160)
    >>> me­di­a.tv
    (4096, 2160)
    >>> del me­di­a.tv
    >>> me­di­a.tv
    (1024, 768)
    >>> me­di­a.screen
    (1920, 1080)
    >>> Video­Driv­er.tv  # doctest: +EL­LIP­SIS
    <__­main__.Res­o­lu­tion ob­ject at 0x...>
    """
    tv = Res­o­lu­tion('tv', (1024, 768))
    screen = Res­o­lu­tion('screen', (1920, 1080))
if __­name__ == '__­main__':
    im­port doctest
    doctest.test­mod()

In this case resolution is a descriptor that implements only __get__(). If an instance of the display manager, has a resolution set, it will retrieve just that one. On the other hand, if it does not, then when we access one of the class attributes like media.tv, what actually happens is that Python calls:

VideoDriver.tv.__get__(media, VideoDriver)

Which executes the code in the __get__() method of the descriptor, which in this case returns the default value, previously passed.

In gen­er­al 4 a code like:

<instance>.<descriptor>

Will be trans­lat­ed to:

type(<instance>).<descriptor>.__get__(<instance>, type(<instance>))

When the de­scrip­tor is called from the class, and not the in­stance, the val­ue of the pa­ram­e­ter “in­stance” is None, but the “own­er” is still a ref­er­ence to the class be­ing in­voked (that’s prob­a­bly one of the rea­sons why these are two sep­a­rate pa­ram­e­ter­s, in­stead of just let the us­er de­rive the class from the in­stance, it al­lows even more flex­i­bil­i­ty).

For this rea­son, is com­mon to han­dle this case, and re­turn the de­scrip­tor it­self, which is the ra­tio­nale be­hind the line:

if instance is None:
    return self

That is why when you de­fine a prop­er­ty in a class, and call it from an in­stance ob­jec­t, you’ll get the re­sult of the com­pu­ta­tion of the method. How­ev­er, if you call the prop­er­ty from the class, you get the prop­er­ty ob­jec­t.

Setting Data

Example: imagine we want to have some attributes in an object that are going to be traced, by other attributes that keep track, of how many times their values changed. So, for example, for every attribute <x> on the object, there would be a corresponding count_<x> one, that will keep count of how many times x changed its value. For simplicity let’s assume attributes starting with count_<name>, cannot be modified, and those only correspond to the count of attribute <name>.

There may be several ways to address this problem. One way could be overriding __setattr__(). Another option, could be by the means of properties (getters and setters) for each attribute we want to track. Or, we can use descriptors.

Both the properties, and __setattr__() approaches, might be subject to code repetition. Their logic should be repeated for several different properties, unless a property function builder is created (in order to reuse the logic of the property across several variables). As per the __setattr__() strategy, if we need to use this logic in multiple classes we would have to come up with some sort of mixin class, in order to achieve it, and if one of the classes already overrides this method, things might get overcomplicated.

These two op­tions seem rather con­vo­lut­ed. De­scrip­tors it is, then.

de­scrip­tors0_set0.py (Source)

class Traced­Prop­er­ty:
    """­Keep count of how many times an at­tribute changed its val­ue"""
    def __set_­name__(self, own­er, name):
        self.name = name
        self.coun­t_­name = f'coun­t_{name}'
    def __set__(self, in­stance, val­ue):
        try:
            cur­ren­t_­val­ue = in­stance.__­dic­t__[self.name]
        ex­cept Key­Er­ror:
            in­stance.__­dic­t__[self.coun­t_­name] = 0
        else:
            if cur­ren­t_­val­ue != val­ue:
                in­stance.__­dic­t__[self.coun­t_­name] += 1
        in­stance.__­dic­t__[self.name] = val­ue
class Trav­eller:
    """
    >>> tourist = Trav­eller('John Smith')
    >>> tourist.c­ity = 'Barcelon­a'
    >>> tourist.­coun­try = 'S­pain'
    >>> tourist.­coun­t_c­i­ty
    0
    >>> tourist.­coun­t_­coun­try
    0
    >>> tourist.c­ity = 'S­tock­holm'
    >>> tourist.­coun­try = 'Swe­den'
    >>> tourist.­coun­t_c­i­ty
    1
    >>> tourist.­coun­t_­coun­try
    1
    >>> tourist.c­ity = 'Gothen­burg'
    >>> tourist.­coun­t_c­i­ty
    2
    >>> tourist.­coun­t_­coun­try
    1
    >>> tourist.­coun­try = 'Swe­den'
    >>> tourist.­coun­t_­coun­try
    1
    """
    city = Traced­Prop­er­ty()
    coun­try = Traced­Prop­er­ty()
    def __init__(self, name):
        self.name = name
if __­name__ == '__­main__':
    im­port doctest
    doctest.test­mod()

The docstring on the Traveller class, pretty much explains its intended use. The important thing about this, is the public interface: it’s absolutely transparent for the user. An object that interacts with a Traveller instance, gets a clean interface, with the properties exposed, without having to worry about the underlying implementation.

So, we have two classes, with different responsibilities, but related, because they interact towards a common goal. Traveller has two class attributes that, are objects, instances of the descriptor.

Now let’s take a look at the oth­er side of it, the in­ter­nal work­ing of the de­scrip­tor.

Un­der this schema, Python will trans­late a call like:

traveller = Traveller()
traveller.city = 'Stockholm'

To the one using the __set__ method in the descriptor, like:

Traveller.city.__set__(traveller, 'Stockholm')

Which means that the __set__ method on the descriptor is going to receive the instance of the object being accessed, as a first parameter, and then the value that is being assigned.

More gen­er­al­ly we could say that some­thing like:

obj.<descriptor> = <value>

Trans­lates to:

type(obj).__set__(obj, <value>)

With these two pa­ram­e­ter­s, we can ma­nip­u­late the in­ter­ac­tion any way we wan­t, which makes the pro­to­col re­al­ly pow­er­ful.

In this example, we are taking advantage of this, by querying the original object’s attribute dictionary (instance.__dict__), and getting the value in order to compare it with the newly received one. By reading this value, we calculate another attribute which will hold the count of the number of times the attribute was modified, and then, both of them are updated in the original dictionary for the instance.

An im­por­tant con­cept to point out is that this im­ple­men­ta­tion not on­ly work­s, but it al­so solves the prob­lem in a more gener­ic fash­ion. In this ex­am­ple, it was the case of a trav­eller, of whom we want­ed to know how many times changed of lo­ca­tion, but the ex­act same ob­ject could be used for ex­am­ple to mon­i­tor mar­ket stock­s, vari­ables in an equa­tion, etc. This ex­pos­es func­tion­al­i­ty as a sort of li­brary, toolk­it, or even frame­work. In fac­t, many well-­known frame­works in Python use de­scrip­tors to ex­pose their API.

Deleting Data

The __delete__() method is going to be called when an instruction of the type del <instance>.<descriptor> is executed. See the following example.

de­scrip­tors0_delete0.py (Source)

"""An ex­am­ple of a de­scrip­tor with a ``__delete__()`` method.
The code is for il­lus­tra­tion pur­pos­es on­ly, and it does not cor­re­spond to any
ac­tu­al im­ple­men­ta­tion.
"""
class Pro­tecte­dAt­tribute:
    """A class at­tribute that can be pro­tect­ed against dele­tion"""
    def __set_­name__(self, own­er, name):
        self.name = name
    def __set__(self, in­stance, val­ue):
        in­stance.__­dic­t__[self.name] = val­ue
    def __delete__(self, in­stance):
        raise At­tribu­teEr­ror(f"Can't delete {self.name} for {in­stance!s}")
class Pro­tect­e­dUs­er:
    """
    >>> usr = Pro­tect­e­dUser('j­smith', '127.0.0.1')
    >>> us­r.user­name
    'j­smith'
    >>> del us­r.user­name
    ­Trace­back (most re­cent call last):
    ...
    At­tribu­teEr­ror: Can't delete user­name for Pro­tect­e­dUser[j­smith]
    >>> us­r.lo­ca­tion
    '127.0.0.1'
    >>> del us­r.lo­ca­tion
    >>> us­r.lo­ca­tion
    ­Trace­back (most re­cent call last):
    ...
    At­tribu­teEr­ror: 'Pro­tect­e­dUser' ob­ject has no at­tribute 'lo­ca­tion'
    """
    user­name = Pro­tecte­dAt­tribute()
    def __init__(self, user­name, lo­ca­tion):
        self.user­name = user­name
        self.lo­ca­tion = lo­ca­tion
    def __str__(self):
        re­turn f"{self.__­class__.__­name__}[{self.user­name}]"
if __­name__ == '__­main__':
    im­port doctest
    doctest.test­mod()

In this ex­am­ple, we just want a prop­er­ty in the ob­jec­t, that can­not be delet­ed, and de­scrip­tors, again, pro­vide one of the mul­ti­ple pos­si­ble im­ple­men­ta­tion­s.

Caveats and recommendations

  • Re­mem­ber that de­scrip­­tors should al­ways be used as class at­tributes.

  • Da­ta should be stored in each orig­i­nal man­aged in­stance, in­stead of do­ing da­ta book­keep­ing in the de­scrip­tor. Each ob­ject should have its da­ta in its __­dic­t__.

  • Pre­serve the abil­i­ty of ac­cess­ing the de­scrip­tor from the class as well, not on­ly from in­stances. Mind the case when in­stance is None, so it can be called as type­(in­stance).de­scrip­tor.

  • Do not over­ride __ge­tat­tribute__(), or they might lose ef­fec­t.

  • Mind the dif­fer­­ence be­tween da­­ta and non-­­da­­ta de­scrip­­tors 3.

  • Im­­ple­­ment the min­i­­mum re­quired in­­ter­­face.

Food for thought

Descriptors provide a framework for abstracting away repetitive access logic. The term framework here is not a coincidence. As the reader might have noticed, by using descriptors, there is an inversion of control (IoC) on the code, because Python will be calling the logic we put under the descriptor methods, when accessing these attributes from the managed instance.

Un­der this con­sid­er­a­tions it is cor­rect to think that it be­haves as a frame­work.

Summary

De­scrip­tors pro­vide an API, to con­trol the core ac­cess to an ob­jec­t’s da­ta mod­el, at its low-lev­el op­er­a­tions. By means of de­scrip­tors we can con­trol the ex­e­cu­tion of an ob­jec­t’s in­ter­face, be­cause they pro­vide a trans­par­ent lay­er be­tween the pub­lic in­ter­face (what is ex­posed to user­s), and the in­ter­nal rep­re­sen­ta­tion and stor­age of da­ta.

They are one of the most powerful features of Python, and their possibilities are virtually unlimited, so in this post we’ve only scratched the surface of them. More details, such as exploring the different types of descriptors with their internal representation or data, the use of the new __set_name__ magic method, their relation with decorators, and analysis of good implementations, are some of the topics for future entries.

1

Python Cook­book (3rd edi­tion) - David Bea­z­ley & Bri­an K. Jones

2

http­s://­doc­s.python.org/3.6/ref­er­ence/­data­mod­el.htm­l#de­scrip­tors

3

More de­tails about this, will come in a fu­ture post.

4

http­s://­doc­s.python.org/3.6/how­to/de­scrip­tor.htm­l#in­vok­ing-de­scrip­tors