Subtleties of Python

In our pro­fes­sion, at­ten­tion to de­tail is of ut­most im­por­tance. Any good ­soft­ware en­gi­neer un­der­stands that de­tails mat­ter a lot, as they make the d­if­fer­ence be­tween a work­ing unit or a dis­as­ter [1].

This is why clean code is not just about for­mat­ting or ar­rang­ing the code. It’s nei­ther a foible. It is in­stead, pay­ing at­ten­tion ex­act­ly to those de­tails that will make a big dif­fer­ence in pro­duc­tion.

Let’s see some ex­am­ples of this in Python.

Dis­claimer: The fol­low­ing cas­es be­ing pre­sent­ed are just ex­am­ples. They are not meant to be in­ter­pret­ed as pat­tern­s, in the sense of “ev­ery time you find X, ap­ply Y”.

Be­low there are two short cas­es of prob­lem­at­ic code I have found in whilst ­work­ing on past code bases.

Mutable Objects and Attributes

This is one of the main rea­sons why I like func­tion­al pro­gram­ming: im­mutabil­i­ty.

In func­tion­al soft­ware, we would­n’t be talk­ing about mu­ta­tion, be­cause such a thing is sim­ply not al­lowed. But I di­gress. The point is that in most of the ­soft­ware de­fined with ob­ject­s, they mu­tate, they change their in­ter­nal state or rep­re­sen­ta­tion. It’s true that we could have im­mutable ob­ject­s, but that’s usu­al­ly not the case.

Con­sid­er the fol­low­ing code (spoil­er: there is some­thing re­al­ly wrong with it).

sub­tleties0.py (Source)

"""Mutable objects as class attributes."""
import logging

logger = logging.getLogger(__name__)


class Query:
    PARAMETERS = {"limit": 100, "offset": 0}

    def run_query(self, query, limit=None, offset=None):
        if limit is not None:
            self.PARAMETERS.update(limit=limit)
        if offset is not None:
            self.PARAMETERS.update(offset=offset)
        return self._run(query, **self.PARAMETERS)

    @staticmethod
    def _run(query, limit, offset):
        logger.info("running %s [%s, %s]", query, limit, offset)

When try­ing to use dic­tio­nar­ies to pass key­word ar­gu­ments, it might be tempt­ing ­to mu­tate them in or­der to adapt it to the sig­na­ture of the func­tion we want to ­cal­l. But be­fore mu­tat­ing an ob­jec­t, we should think on it’s scope and the ­con­se­quences this mu­ta­tion will bring.

In this case, the dic­tio­nary that the method mu­tates be­longs to the class. Chang­ing it, will change the class. Af­ter it has been mu­tat­ed, the de­fault­ ­val­ues will re­main from the last time it was up­dat­ed. More­over, since this ­dic­tionary be­longs to the class, all in­stances (in­clud­ing new ones), will car­ry on with this:

>>> q = Query()
>>> q.run_query("select 1")
running select 1 [100, 0]

>>> q.run_query("select 1", limit=50)
running select 1 [50, 0]

>>> q.run_query("select 1")
running select 1 [50, 0]

>>> q.PARAMETERS
{'limit': 50, 'offset': 0}

>>> new_query = Query()
>>> new_query.PARAMETERS
{'limit': 50, 'offset': 0}

As we can ob­serve, this is not what we wan­t, and it’s ex­treme­ly frag­ile.

Here are some gen­er­al rec­om­men­da­tion­s:

  1. Don’t change mu­ta­ble ob­jects passed by pa­ram­e­ter to func­tion­s. Cre­ate new ­copies of the ob­jects when­ev­er pos­si­ble, and re­turn them ac­cord­ing­ly.
  2. Don’t mu­tate class at­tributes.
  3. Try not to set mu­ta­ble ob­jects as class at­tributes.

Need­less to say, there are ex­cep­tions to these rules, af­ter all prag­ma­tism beats pu­ri­ty. Here are some ob­vi­ous ex­cep­tions to these rules [2]: :

  • For item 1, we always have to consider the trade-off memory/speed. If the object it’s too big (perhaps a large dictionary), running copy.deepcopy() on it will be slow, and it will take a lot of memory, so it’s probably faster to just modify it in place.
  • The ob­vi­ous ex­cep­tion for rule [2] is when us­ing de­scrip­tors, but be­cause it’s like­ly an sce­nario on which we are count­ing on that side ef­fec­t. Oth­er than that, there should­n’t be any rea­son to go on such a dan­ger­ous path.
  • Rule [3] should­n’t be a prob­lem if the at­tributes are read­-on­ly, and we’re ­sure they are nev­er go­ing to be mu­tat­ed. In that case, set­ting dic­tio­nar­ies and lists as class at­tributes, might be fine, but I’m on­ly wor­ried about the ­fact that even though you can be sure that right now there is no method­ ­mu­tat­ing them, you can’t as­sure no­body will break that rule in the fu­ture.

Iterators

The it­er­a­tor pro­to­col in Python is a great fea­ture. It al­lows us to treat an en­tire set of ob­jects by their be­hav­iour re­gard­less of their in­ter­nal rep­re­sen­ta­tion.

For in­stance we can write a code like:

for i in myiterable: ...

And we don’t know ex­act­ly what myit­er­able is. It might be a list, a tu­ple, a dic­tio­nary, a string, and it will still work.

We can al­so re­ly on all meth­ods that use this pro­to­col,

mylist.extend(myiterable)

Of course mylist has to be a list, but again myit­er­able can be any­thing that can be it­er­at­ed over.

Un­for­tu­nate­ly, this amaz­ing fea­ture it’s al­so the cause of some sub­tle headaches.

A long time ago, I re­mem­ber hear­ing a com­plaint from a co-­work­er say­ing that a ­part of the code was par­tic­u­lar­ly slow. The code in ques­tion was a func­tion that was sup­posed to process and then move files to a giv­en tar­get di­rec­to­ry. Y­ou can imag­ine some­thing like this:

def process_files(files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

Now what hap­pen­s? Like in the in­tro­duc­tion, we are us­ing the it­er­a­tor pro­to­col, ­so we re­ly on the fact that we don’t ex­act­ly know what files_­to_pro­cess is ex­act­ly (a tu­ple, a list, etc.)

There is a sub­tle is­sue. Strings are al­so it­er­able. If you pass a sin­gle ­file, let’s say /home­/ubun­tu/­foo, each char­ac­ter will be it­er­at­ed over, s­tart­ing with / (the root di­rec­to­ry), fol­low­ing with h, etc. That’s why the pro­grams was slow. It was copy­ing the en­tire file sys­tem!

The so­lu­tion is to use a bet­ter in­ter­face, that dis­al­low these er­rors en­tire­ly:

def process_files(*files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

In this ex­am­ple the sig­na­ture of the func­tion ex­pos­es a much nicer in­ter­face, in the sense that it can al­low one or mul­ti­ple files as ar­gu­ments, with­out the is­sue of the pre­vi­ous ex­am­ple. This change al­so makes tar­get_di­rec­to­ry to be key­word-on­ly, which is more ex­plic­it as well.

Closing words

I hope you en­joyed the con­tent, and that you got an idea of how crit­i­cal some de­tails can be.

The re­al val­ue of clean code is to avoid dis­as­ter­s. Top­ics like these ones (and ­more), are cov­ered in my lat­est ti­tle, should the read­er be in­ter­est­ed.

[1] I’m not exaggerating when I say disaster. There are many famous examples of crashes in software as a result of edge cases, or errors that can be narrowed down to a single line of code.
[2] The list is by no means exhaustive.