Subtleties of Python

In our pro­fe­s­sio­n, atten­tion to de­tail is of ut­most im­por­tan­ce. Any good ­so­ftwa­re en­gi­neer un­ders­tan­ds that de­tails ma­tter a lo­t, as they make the ­di­ffe­ren­ce be­tween a wo­rking unit or a di­sas­ter [1].

This is why clean co­de is not just about for­ma­tting or arran­ging the co­de. It’s ­nei­ther a foi­ble. It is ins­tea­d, pa­ying atten­tion exac­tly to tho­se de­tails tha­t wi­ll make a big di­ffe­ren­ce in pro­duc­tio­n.

Le­t’s see so­me exam­ples of this in Py­tho­n.

Dis­clai­me­r: The fo­llo­wing ca­ses being pre­sen­ted are just exam­ple­s. They are no­t ­meant to be in­ter­pre­ted as pa­ttern­s, in the sen­se of “e­ve­ry ti­me you find X, a­pply Y”.

Be­low the­re are two short ca­ses of pro­ble­ma­tic co­de I ha­ve found in whils­t wo­rking on past co­de ba­ses.

Mutable Objects and Attributes

This is one of the main rea­sons why I like func­tio­nal pro­gra­m­min­g: i­m­mu­ta­bi­li­ty.

In func­tio­nal so­ftwa­re, we would­n’t be ta­lking about mu­ta­tio­n, be­cau­se su­ch a ­thing is sim­ply not allo­we­d. But I di­gress. The point is that in most of the ­so­ftwa­re de­fi­ned wi­th ob­jec­ts, they mu­ta­te, they chan­ge their in­ter­nal sta­te or ­re­pre­sen­ta­tio­n. It’s true that we could ha­ve im­mu­ta­ble ob­jec­ts, but tha­t’s u­sua­lly not the ca­se.

Con­si­der the fo­llo­wing co­de (s­poi­le­r: the­re is so­me­thing rea­lly wrong wi­th it).

sub­tle­tie­s0.­py (Sour­ce)

"""Mutable objects as class attributes."""
import logging

logger = logging.getLogger(__name__)


class Query:
    PARAMETERS = {"limit": 100, "offset": 0}

    def run_query(self, query, limit=None, offset=None):
        if limit is not None:
            self.PARAMETERS.update(limit=limit)
        if offset is not None:
            self.PARAMETERS.update(offset=offset)
        return self._run(query, **self.PARAMETERS)

    @staticmethod
    def _run(query, limit, offset):
        logger.info("running %s [%s, %s]", query, limit, offset)

When tr­ying to use dic­tio­na­ries to pa­ss ke­yword ar­gu­men­ts, it mi­ght be temp­tin­g ­to mu­ta­te them in or­der to adapt it to the sig­na­tu­re of the func­tion we want to­ ­ca­ll. But be­fo­re mu­ta­ting an ob­jec­t, we should thi­nk on it’s sco­pe and the ­con­se­quen­ces this mu­ta­tion wi­ll brin­g.

In this ca­se, the dic­tio­na­ry that the me­thod mu­ta­tes be­longs to the cla­ss. ­Chan­ging it, wi­ll chan­ge the cla­ss. After it has been mu­tate­d, the de­faul­t ­va­lues wi­ll re­main from the last ti­me it was up­date­d. Mo­reo­ve­r, sin­ce this ­dic­tio­na­ry be­longs to the cla­ss, all ins­tan­ces (in­clu­ding new ones), wi­ll ca­rr­y on wi­th this:

>>> q = Query()
>>> q.run_query("select 1")
running select 1 [100, 0]

>>> q.run_query("select 1", limit=50)
running select 1 [50, 0]

>>> q.run_query("select 1")
running select 1 [50, 0]

>>> q.PARAMETERS
{'limit': 50, 'offset': 0}

>>> new_query = Query()
>>> new_query.PARAMETERS
{'limit': 50, 'offset': 0}

As we can ob­ser­ve, this is not what we wan­t, and it’s ex­tre­me­ly fra­gi­le.

He­re are so­me ge­ne­ral re­co­m­men­da­tion­s:

  1. Do­n’t chan­ge mu­ta­ble ob­jec­ts pa­ss­ed by pa­ra­me­ter to func­tion­s. Crea­te new ­co­pies of the ob­jec­ts whe­ne­ver po­s­si­ble, and re­turn them ac­cor­din­gl­y.
  2. Do­n’t mu­ta­te cla­ss attri­bu­tes.
  3. Try not to set mu­ta­ble ob­jec­ts as cla­ss attri­bu­tes.

Nee­d­le­ss to sa­y, the­re are ex­cep­tions to the­se ru­le­s, after all prag­ma­tis­m ­bea­ts pu­ri­ty. He­re are so­me ob­vious ex­cep­tions to the­se ru­les [2]: :

  • For item 1, we always have to consider the trade-off memory/speed. If the object it’s too big (perhaps a large dictionary), running copy.deepcopy() on it will be slow, and it will take a lot of memory, so it’s probably faster to just modify it in place.
  • The ob­vious ex­cep­tion for ru­le [2] is when using des­crip­tor­s, but be­cau­se i­t’s like­ly an sce­na­rio on whi­ch we are coun­ting on that si­de effec­t. Othe­r ­than tha­t, the­re should­n’t be any rea­son to go on su­ch a dan­ge­rous pa­th.
  • Ru­le [3] should­n’t be a pro­blem if the attri­bu­tes are rea­d-on­l­y, and we’­re ­su­re they are ne­ver going to be mu­tate­d. In that ca­se, se­tting dic­tio­na­rie­s and lis­ts as cla­ss attri­bu­tes, mi­ght be fi­ne, but I’m on­ly wo­rried about the ­fact that even thou­gh you can be su­re that ri­ght now the­re is no me­tho­d ­mu­ta­ting the­m, you can’t as­su­re no­body wi­ll break that ru­le in the fu­tu­re.

Iterators

The ite­ra­tor pro­to­col in Py­thon is a great fea­tu­re. It allo­ws us to treat an en­ti­re set of ob­jec­ts by their be­ha­viour re­gard­le­ss of their in­ter­na­l ­re­pre­sen­ta­tio­n.

For ins­tan­ce we can wri­te a co­de like:

for i in myiterable: ...

And we do­n’t know exac­tly what myi­te­ra­ble is. It mi­ght be a lis­t, a tu­ple, a dic­tio­na­r­y, a strin­g, and it wi­ll sti­ll wo­rk.

We can al­so re­ly on all me­tho­ds that use this pro­to­co­l,

mylist.extend(myiterable)

Of cour­se my­list has to be a lis­t, but again myi­te­ra­ble can be an­y­thin­g ­that can be ite­ra­ted ove­r.

Un­for­tu­na­te­l­y, this ama­zing fea­tu­re it’s al­so the cau­se of so­me sub­tle hea­da­che­s.

A long ti­me ago, I re­mem­ber hea­ring a com­plaint from a co­-wo­rker sa­ying that a ­part of the co­de was par­ti­cu­lar­ly slo­w. The co­de in ques­tion was a func­tio­n ­that was su­ppo­sed to pro­ce­ss and then mo­ve fi­les to a gi­ven tar­get di­rec­to­r­y. ­You can ima­gi­ne so­me­thing like this:

def process_files(files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

Now what ha­ppen­s? Like in the in­tro­duc­tio­n, we are using the ite­ra­tor pro­to­co­l, ­so we re­ly on the fact that we do­n’t exac­tly know what fi­le­s_­to­_­pro­ce­ss is e­xac­tly (a tu­ple, a lis­t, etc.)

The­re is a sub­tle is­sue. Strings are al­so ite­ra­ble. If you pa­ss a sin­gle ­fi­le, le­t’s say /ho­me/u­bun­tu/­foo, ea­ch cha­rac­ter wi­ll be ite­ra­ted ove­r, s­tar­ting wi­th / (the root di­rec­to­r­y), fo­llo­wing wi­th h, etc. Tha­t’s wh­y ­the pro­gra­ms was slo­w. It was co­p­ying the en­ti­re fi­le sys­te­m!

The so­lu­tion is to use a be­tter in­ter­fa­ce, that di­sa­llow the­se errors en­ti­re­l­y:

def process_files(*files_to_process, target_directory):
    for file_ in files_to_process:
        # ...
        shutil.copy2(file_, target_directory)

In this exam­ple the sig­na­tu­re of the func­tion ex­po­ses a mu­ch ni­cer in­ter­fa­ce, in the sen­se that it can allow one or mul­ti­ple fi­les as ar­gu­men­ts, wi­thout the is­sue of the pre­vious exam­ple. This chan­ge al­so makes tar­ge­t_­di­rec­to­ry to­ ­be ke­ywor­d-on­l­y, whi­ch is mo­re ex­pli­cit as we­ll.

Closing words

I ho­pe you en­jo­yed the con­ten­t, and that you got an idea of how cri­ti­cal so­me ­de­tails can be.

The real va­lue of clean co­de is to avoid di­sas­ter­s. To­pi­cs like the­se ones (an­d ­mo­re), are co­ve­red in my la­test ti­tle, s­hould the rea­der be in­te­res­te­d.

[1] I’m not exaggerating when I say disaster. There are many famous examples of crashes in software as a result of edge cases, or errors that can be narrowed down to a single line of code.
[2] The list is by no means exhaustive.