Subtleties of Python

In our pro­fe­s­sio­n, atten­tion to de­tail is of ut­most im­por­tan­ce. Any good ­so­ftwa­re en­gi­neer un­ders­tan­ds that de­tails ma­tter a lo­t, as they make the ­di­ffe­ren­ce be­tween a wo­rking unit or a di­sas­ter 1.

This is why clean co­de is not just about for­ma­tting or arran­ging the co­de. It’s ­nei­ther a foi­ble. It is ins­tea­d, pa­ying atten­tion exac­tly to tho­se de­tails tha­t wi­ll make a big di­ffe­ren­ce in pro­duc­tio­n.

Le­t’s see so­me exam­ples of this in Py­tho­n.

Dis­clai­me­r: The fo­llo­wing ca­ses being pre­sen­ted are just exam­ple­s. They are no­t ­meant to be in­ter­pre­ted as pa­ttern­s, in the sen­se of “e­ve­ry ti­me you find X, a­pply Y”.

Be­low the­re are two short ca­ses of pro­ble­ma­tic co­de I ha­ve found in whils­t wo­rking on past co­de ba­ses.

Mutable Objects and Attributes

This is one of the main rea­sons why I like func­tio­nal pro­gra­m­min­g: i­m­mu­ta­bi­li­ty.

In func­tio­nal so­ftwa­re, we would­n’t be ta­lking about mu­ta­tio­n, be­cau­se su­ch a ­thing is sim­ply not allo­we­d. But I di­gress. The point is that in most of the ­so­ftwa­re de­fi­ned wi­th ob­jec­ts, they mu­ta­te, they chan­ge their in­ter­nal sta­te or ­re­pre­sen­ta­tio­n. It’s true that we could ha­ve im­mu­ta­ble ob­jec­ts, but tha­t’s u­sua­lly not the ca­se.

Con­si­der the fo­llo­wing co­de (s­poi­le­r: the­re is so­me­thing rea­lly wrong wi­th it).

sub­tle­tie­s0.­py (Sour­ce)

"""Mutable objects as class attributes."""
import logging

logger = logging.getLogger(__name__)


class Query:
    PARAMETERS = {"limit": 100, "offset": 0}

    def run_query(self, query, limit=None, offset=None):
        if limit is not None:
            self.PARAMETERS.update(limit=limit)
        if offset is not None:
            self.PARAMETERS.update(offset=offset)
        return self._run(query, **self.PARAMETERS)

    @staticmethod
    def _run(query, limit, offset):
        logger.info("running %s [%s, %s]", query, limit, offset)

When tr­ying to use dic­tio­na­ries to pa­ss ke­yword ar­gu­men­ts, it mi­ght be temp­tin­g ­to mu­ta­te them in or­der to adapt it to the sig­na­tu­re of the func­tion we want to­ ­ca­ll. But be­fo­re mu­ta­ting an ob­jec­t, we should thi­nk on it’s sco­pe and the ­con­se­quen­ces this mu­ta­tion wi­ll brin­g.

In this ca­se, the dic­tio­na­ry that the me­thod mu­ta­tes be­longs to the cla­ss. ­Chan­ging it, wi­ll chan­ge the cla­ss. After it has been mu­tate­d, the de­faul­t ­va­lues wi­ll re­main from the last ti­me it was up­date­d. Mo­reo­ve­r, sin­ce this ­dic­tio­na­ry be­longs to the cla­ss, all ins­tan­ces (in­clu­ding new ones), wi­ll ca­rr­y on wi­th this:

>>> q = Query()
>>> q.run_query("select 1")
running select 1 [100, 0]

>>> q.run_query("select 1", limit=50)
running select 1 [50, 0]

>>> q.run_query("select 1")
running select 1 [50, 0]

>>> q.PARAMETERS
{'limit': 50, 'offset': 0}

>>> new_query = Query()
>>> new_query.PARAMETERS
{'limit': 50, 'offset': 0}

As we can ob­ser­ve, this is not what we wan­t, and it’s ex­tre­me­ly fra­gi­le.

He­re are so­me ge­ne­ral re­co­m­men­da­tion­s:

  1. Do­­n’t chan­­ge mu­­ta­­ble ob­­je­c­­ts pa­ss­ed by pa­­ra­­me­­ter to fun­c­­tio­n­s. Crea­­te new ­­co­­­pies of the ob­­je­c­­ts whe­­ne­­ver po­­s­­si­­ble, and re­­turn them ac­­co­r­­di­n­­gl­­y.

  2. Do­­n’t mu­­ta­­te cla­ss attri­­bu­­tes.

  3. Try not to set mu­­ta­­ble ob­­je­c­­ts as cla­ss attri­­bu­­tes.

Nee­d­le­ss to sa­y, the­re are ex­cep­tions to the­se ru­le­s, after all prag­ma­tis­m ­bea­ts pu­ri­ty. He­re are so­me ob­vious ex­cep­tions to the­se ru­les 2: :

  • For item 1, we alwa­ys ha­ve to con­si­der the tra­de-o­ff me­mo­r­y/s­pee­d. If the ob­ject it’s too big (perhaps a lar­ge dic­tio­na­r­y), run­nin­g co­p­y.­deep­co­p­y() on it wi­ll be slo­w, and it wi­ll take a lot of me­mo­r­y, ­so it’s pro­ba­bly fas­ter to just mo­di­fy it in pla­ce.

  • The ob­­vious ex­­ce­p­­tion for ru­­le [2] is when using des­­cri­p­­to­r­s, but be­­­cau­se i­­t’s like­­ly an sce­­na­­rio on whi­­ch we are coun­­ting on that si­­de effe­c­­t. Othe­­r ­­than tha­­t, the­­re shoul­d­n’t be any rea­­son to go on su­­ch a dan­­ge­­rous pa­­th.

  • Ru­­le [3] shoul­d­n’t be a pro­­­blem if the attri­­bu­­tes are rea­­d-o­n­­l­­y, and we’­­re ­­su­­re they are ne­­ver going to be mu­­ta­te­­d. In that ca­se, se­­tting di­c­­tio­­­na­­rie­s and lis­­ts as cla­ss attri­­bu­­tes, mi­­ght be fi­­ne, but I’m on­­ly wo­­­rried about the ­­fact that even thou­­gh you can be su­­re that ri­­ght now the­­re is no me­­tho­­­d ­­mu­­ta­­ting the­­m, you can’t as­­su­­re no­­­body wi­­ll break that ru­­le in the fu­­tu­­re.

Iterators

The ite­ra­tor pro­to­col in Py­thon is a great fea­tu­re. It allo­ws us to treat an en­ti­re set of ob­jec­ts by their be­ha­viour re­gard­le­ss of their in­ter­na­l ­re­pre­sen­ta­tio­n.

For ins­tan­ce we can wri­te a co­de like:

for i in myiterable: ...

And we don’t know exactly what myiterable is. It might be a list, a tuple, a dictionary, a string, and it will still work.

We can al­so re­ly on all me­tho­ds that use this pro­to­co­l,

mylist.extend(myiterable)

Of course mylist has to be a list, but again myiterable can be anything that can be iterated over.

Un­for­tu­na­te­l­y, this ama­zing fea­tu­re it’s al­so the cau­se of so­me sub­tle hea­da­che­s.

A long ti­me ago, I re­mem­ber hea­ring a com­plaint from a co­-wo­rker sa­ying that a ­part of the co­de was par­ti­cu­lar­ly slo­w. The co­de in ques­tion was a func­tio­n ­that was su­ppo­sed to pro­ce­ss and then mo­ve fi­les to a gi­ven tar­get di­rec­to­r­y. ­You can ima­gi­ne so­me­thing like this:

def process_files(files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

Now what happens? Like in the introduction, we are using the iterator protocol, so we rely on the fact that we don’t exactly know what files_to_process is exactly (a tuple, a list, etc.)

There is a subtle issue. Strings are also iterable. If you pass a single file, let’s say /home/ubuntu/foo, each character will be iterated over, starting with / (the root directory), following with h, etc. That’s why the programs was slow. It was copying the entire file system!

The so­lu­tion is to use a be­tter in­ter­fa­ce, that di­sa­llow the­se errors en­ti­re­l­y:

def process_files(*files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

In this example the signature of the function exposes a much nicer interface, in the sense that it can allow one or multiple files as arguments, without the issue of the previous example. This change also makes target_directory to be keyword-only, which is more explicit as well.

Closing words

I ho­pe you en­jo­yed the con­ten­t, and that you got an idea of how cri­ti­cal so­me ­de­tails can be.

The real va­lue of clean co­de is to avoid di­sas­ter­s. To­pi­cs like the­se ones (an­d ­mo­re), are co­ve­red in my la­test ti­tle, s­hould the rea­der be in­te­res­te­d.

1

I’m not exa­gge­ra­ting when I say di­sas­te­r. The­re are many fa­mous exam­ple­s of cras­hes in so­ftwa­re as a re­sult of edge ca­ses, or errors that can be­ ­na­rro­wed do­wn to a sin­gle li­ne of co­de.

2

The list is by no means exhaus­ti­ve.