Exploring Generators and Coroutines

Le­t’s re­vi­sit the idea of ge­ne­ra­tors in Py­tho­n, in or­der to un­ders­tan­d how the su­pport for co­rou­ti­nes was achie­ved in la­test ver­sions of Py­thon (3.6, at the ti­me of this wri­tin­g).

By re­viewing the mi­les­to­nes on ge­ne­ra­tor­s, ch­ro­no­lo­gi­ca­ll­y, we can get a be­tte­r i­dea of the evo­lu­tion that lead to as­yn­ch­ro­nous pro­gra­m­ming in Py­tho­n.

We wi­ll re­view the main chan­ges in Py­thon that re­la­te to ge­ne­ra­tors an­d a­s­yn­ch­ro­nous pro­gra­m­min­g, star­ting wi­th PE­P-255 (Sim­ple Ge­ne­ra­tor­s), PE­P-342 (­Co­rou­ti­nes via Enhan­ced Ge­ne­ra­tor­s), PE­P-380 (S­yn­tax for de­le­ga­ting to a ­Su­b-­Ge­ne­ra­to­r), and fi­nis­hing wi­th PE­P-525 (A­s­yn­ch­ro­nous Ge­ne­ra­tor­s).

Simple Generators

PE­P-255 in­tro­du­ced ge­ne­ra­tors to Py­tho­n. The idea is that when we pro­ce­ss so­me ­da­ta, we do­n’t ac­tua­lly need all of that to be in me­mo­ry at on­ce. ­Most of the ti­me­s, ha­ving one va­lue at the ti­me is enou­gh. La­zy eva­lua­tion is a ­good trait to ha­ve in so­ftwa­re, be­cau­se in this ca­se it means that le­ss ­me­mo­ry is us­e­d. It’s al­so a key con­cept in other pro­gra­m­ming lan­gua­ges, and one of the main ideas be­hind func­tio­nal pro­gra­m­min­g.

The new yield ke­yword was added to Py­tho­n, wi­th the mea­ning of pro­du­cing an e­le­ment that wi­ll be con­su­med by ano­ther ca­ller func­tio­n.

The me­re pre­sen­ce of the yield ke­yword on any part of the func­tio­n, au­to­ma­ti­ca­lly makes that a ge­ne­ra­tor func­tion. When ca­lle­d, this func­tio­n wi­ll crea­te a ge­ne­ra­tor ob­ject, whi­ch can be ad­van­ce­d, pro­du­cing its e­le­men­ts, one at the ti­me. By ca­lling the ge­ne­ra­tor suc­ce­s­si­ve ti­mes wi­th the nex­t() func­tio­n, the ge­ne­ra­tor ad­van­ces to the next yield sta­te­men­t, ­pro­du­cing va­lues. After the ge­ne­ra­tor pro­du­ced a va­lue, the ge­ne­ra­tor is sus­pen­ded, wai­ting to be ca­lled agai­n.

Take the ran­ge buil­t-in func­tio­n, for exam­ple. In Py­thon 2, this func­tio­n ­re­turns a list wi­th all the num­bers on the in­ter­va­l. Ima­gi­ne we want to co­me up wi­th a si­mi­lar im­ple­men­ta­tion of it, in or­der to get the sum of all num­bers up ­to a cer­tain li­mi­t.

LIMIT = 1_000_000
def old_range(n):
    numbers = []
    i = 0
    while i < n:
        numbers.append(i)
        i += 1
    return numbers

print(sum(old_range(LIMIT)))

Now le­t’s see how mu­ch me­mo­ry is us­e­d:

$ /usr/bin/time -f %M python rangesum.py
499999500000
48628

The first num­ber is the re­sult of the prin­t, whilst the se­cond one is the ou­tput of the ti­me co­m­man­d, prin­ting out the me­mo­ry us­ed by the pro­gram (~48 ­Mi­B).

No­w, what if this is im­ple­men­ted wi­th a ge­ne­ra­tor ins­tea­d?

We just ha­ve to get rid of the lis­t, and pla­ce the yield sta­te­ment ins­tea­d, in­di­ca­ting that we want to pro­du­ce the va­lue on the ex­pres­sion that fo­llo­ws the ke­ywor­d.

LIMIT = 1_000_000
def new_range(n):
    i = 0
    while i < n:
        yield i
        i += 1

print(sum(new_range(LIMIT)))

This ti­me, the re­sult is:

$ /usr/bin/time -f %M python rangesum.py
499999500000
8992

We see a hu­ge di­ffe­ren­ce: the im­ple­men­ta­tion that hol­ds all num­bers in a lis­t in me­mo­r­y, uses ~48­Mi­B, whe­reas the im­ple­men­ta­tion that just uses one num­ber at ­the ti­me, uses mu­ch le­ss me­mo­ry (< 9 Mi­B) [1].

We see the idea: when the yield <ex­pres­sio­n> is rea­che­d, the re­sult of the ex­pres­sion wi­ll be pa­ss­ed to the ca­ller co­de, and the ge­ne­ra­tor wi­ll re­mai­n sus­pen­ded at that li­ne in the meanwhi­le.

>>> import inspect
>>> r = new_range(1_000_000)
>>> inspect.getgeneratorstate(r)
'GEN_CREATED'
>>> next(r)
0
>>> next(r)
1
>>> inspect.getgeneratorstate(r)
'GEN_SUSPENDED'

Ge­ne­ra­tors are ite­ra­ble ob­jec­ts. An ite­ra­ble is an ob­ject who­se __i­te­r__ me­tho­d, cons­truc­ts a new ite­ra­tor, eve­ry ti­me is ca­lled (wi­th ite­r(i­t), for ins­tan­ce). An ite­ra­tor is an ob­ject who­se __i­te­r__ re­turns itsel­f, and its __­nex­t__ me­thod con­tains the lo­gic to pro­du­ce new ­va­lues ea­ch ti­me is ca­lle­d, and how to sig­nal the stop (by rai­sin­g Sto­pI­te­ra­tion).

The idea of ite­ra­bles is that they ad­van­ce th­rou­gh va­lues, by ca­lling the ­buil­t-in nex­t() func­tion on it, and this wi­ll pro­du­ce va­lues un­til the Sto­pI­te­ra­tion ex­cep­tion is rai­s­e­d, sig­na­lling the end of the ite­ra­tio­n.

>>> def f():
...     yield 1
...     yield 2

>>> g = f()
>>> next(g)
1
>>> next(g)
2
>>> next(g)
StopIteration:

>>> list(f())
[1, 2]

In the first ca­se, when ca­lling f(), this crea­tes a new ge­ne­ra­to­r. The ­first two ca­lls to nex­t(), wi­ll ad­van­ce un­til the next yield sta­te­men­t, ­pro­du­cing the va­lues they ha­ve se­t. When the­re is no­thing el­se to pro­du­ce, the Sto­pI­te­ra­tion ex­cep­tion is rai­s­e­d. So­me­thing si­mi­lar to this, is ac­tua­ll­y ­run, when we ite­ra­te over this ob­ject in the form of for x in ite­ra­ble: …. On­ly that Py­thon in­ter­na­lly hand­les the ex­cep­tion that de­ter­mi­nes when the fo­r ­lo­op stop­s.

Be­fo­re wra­pping up the in­tro­duc­tion to ge­ne­ra­tor­s, I want to make a qui­ck ­co­m­men­t, and hi­gh­li­ght so­me­thing im­por­tant about the ro­le of ge­ne­ra­tors in the ­lan­gua­ge, and why the­y’­re su­ch a neat abs­trac­tion to ha­ve.

Ins­tead of using the ea­ger ver­sion (the one that sto­res eve­r­y­thing in a lis­t), ­you mi­ght con­si­der avoi­ding that by just using a lo­op and coun­ting in­si­de it. I­t’s like sa­ying “a­ll I need is just the coun­t, so I mi­ght as we­ll jus­t ac­cu­mu­la­te the va­lue in a lo­op, and tha­t’s it”. So­me­thing sli­gh­tly si­mi­lar to:

total = 0
i = 0
while i < LIMIT:
    total += i
    i += 1

This is so­me­thing I mi­ght con­si­der doing in a lan­gua­ge that does­n’t ha­ve ­ge­ne­ra­tor­s. Do­n’t do this. Ge­ne­ra­tors are the ri­ght way to go. By using a ­ge­ne­ra­to­r, we’­re doing mo­re than just wra­pping the co­de of an ite­ra­tio­n; we’­re ­crea­ting a se­quen­ce (whi­ch could even be in­fi­ni­te), and na­ming it. This ­se­quen­ce we ha­ve, is an ob­ject we can use in the rest of the co­de. It’s an a­bs­trac­tio­n. As su­ch, we can com­bi­ne it wi­th the rest of the co­de (for exam­ple ­to fil­ter on it), reu­se it, pa­ss it along to other ob­jec­ts, and mo­re.

For exam­ple, le­t’s say we ha­ve the se­quen­ce created wi­th new_­ran­ge(), an­d ­then we rea­li­ze that we need the first 10 even num­bers of it. This is as sim­ple as doin­g.

>>> import itertools
>>> rg = new_range(1_000_000)
>>> itertools.islice(filter(lambda n: n % 2 == 0, rg), 10)

And this is so­me­thing we could not so ea­si­ly ac­com­plis­h, had we cho­sen to­ ig­no­re ge­ne­ra­tor­s.

For year­s, this has been all pre­tty mu­ch about ge­ne­ra­tors in Py­tho­n. Ge­ne­ra­tor­s we­re in­tro­du­ced wi­th the idea of ite­ra­tion and la­zy com­pu­ta­tion in min­d.

La­ter on, the­re was ano­ther enhan­ce­men­t, by PE­P-342, adding mo­re me­tho­ds to­ ­the­m, wi­th the goal of su­ppor­ting co­rou­ti­nes.

Coroutines

Rou­gh­ly speakin­g, the idea of co­rou­ti­nes is to pau­se the exe­cu­tion of a ­func­tion at a gi­ven poin­t, from whe­re it can be la­ter re­su­me­d. The idea is tha­t whi­le a co­rou­ti­ne is sus­pen­ded, the pro­gram can swi­tch to run ano­ther part of ­the co­de. Ba­si­ca­ll­y, we need func­tions that can be pau­s­e­d.

As we ha­ve seen from the pre­vious exam­ple, ge­ne­ra­tors ha­ve this fea­tu­re: when ­the yield <ex­pres­so­n>, is rea­che­d, a va­lue is pro­du­ced to the ca­lle­r ob­jec­t, and in the meanti­me the ge­ne­ra­tor ob­ject is sus­pen­de­d. This su­gges­te­d ­that ge­ne­ra­tors can be us­ed to su­pport co­rou­ti­nes in Py­tho­n, hen­ce the na­me of ­the PEP being “Co­rou­ti­nes via Enhan­ced Ge­ne­ra­tor­s”.

The­re is mo­re, thou­gh. Co­rou­ti­nes ha­ve to su­pport to be re­su­med fro­m ­mul­ti­ple en­try poin­ts to con­ti­nue their exe­cu­tio­n. The­re­fo­re, mo­re chan­ges are ­re­qui­re­d. We need to be able to pa­ss da­ta ba­ck to the­m, and hand­le ex­cep­tion­s. ­For this, mo­re me­tho­ds we­re added to their in­ter­fa­ce.

  • sen­­d(<­­va­­lue>)
  • th­ro­w(ex_­ty­pe[, ex_­va­lue[, ex_­­tra­­ce­­ba­­ck]])
  • clo­se()

The­se me­tho­ds allow sen­ding a va­lue to a ge­ne­ra­to­r, th­ro­wing an ex­cep­tio­n in­si­de it, and clo­sing it, res­pec­ti­ve­l­y.

The sen­d() me­thod im­plies that yield be­co­mes an ex­pres­sion, ra­ther than a sta­te­ment (as it was be­fo­re). Wi­th this, is po­s­si­ble to as­sign the re­sul­t of a yield to a va­ria­ble, and the va­lue wi­ll be whate­ver it was sent to it.

>>> def gen(start=0):
...     step = start
...     while True:
...         value = yield step
...         print(f"Got {value}")
...         step += 1
...
>>> g =  gen(1)
>>> next(g)
1
>>> g.send("hello")
Got hello
2
>>> g.send(42)
Got 42
3

As we can see from this pre­vious co­de, the va­lue sent by yield is going to­ ­be the re­sult of the send, (in this ca­se, the con­se­cu­ti­ve num­bers of the ­se­quen­ce), whi­le the va­lue pa­ss­ed in the sen­d(), the pa­ra­me­te­r, is the ­re­sult that is as­sig­ned to va­lue as re­tur­ned by the yield, and printe­d out on the next li­ne.

Be­fo­re sen­ding any va­lues to the ge­ne­ra­to­r, this has to be ad­van­ced to the nex­t yield. In fac­t, ad­van­cing is the on­ly allo­wed ope­ra­tion on a new­l­y-­create­d ­ge­ne­ra­to­r. This can be do­ne by ca­lling nex­t(­g) or g.sen­d(­No­ne), whi­ch a­re equi­va­len­t.

War­ning

Re­mem­ber to alwa­ys ad­van­ce a ge­ne­ra­tor that was just create­d, or you wi­ll ­get a Ty­peE­rror.

Wi­th the .th­ro­w() me­thod the ca­ller can make the ge­ne­ra­tor rai­se an ex­cep­tion at the point whe­re is sus­pen­de­d. If this ex­cep­tion is hand­le­d in­ter­na­lly in the ge­ne­ra­to­r, it wi­ll con­ti­nue nor­ma­lly and the re­turn va­lue wi­ll be the one of the next yield li­ne that rea­che­d. If it’s not hand­led by ­the ge­ne­ra­to­r, it wi­ll fai­l, and the ex­cep­tion wi­ll pro­pa­ga­te to the ca­lle­r.

The .clo­se() me­thod is us­ed to ter­mi­na­te the ge­ne­ra­to­r. It wi­ll rai­se the Ge­ne­ra­to­rE­xit ex­cep­tion in­si­de the ge­ne­ra­to­r. If we wish to run so­me clean up co­de, this is the ex­cep­tion to hand­le. When hand­ling this ex­cep­tio­n, the on­ly allo­wed ac­tion is to re­turn a va­lue.

Wi­th the­se addi­tion­s, ge­ne­ra­tors ha­ve now evol­ved in­to co­rou­ti­nes. This mean­s our co­de can now su­pport con­cu­rrent pro­gra­m­ming, sus­pend the exe­cu­tion of ­ta­sks, com­pu­te no­n-­blo­cking I/O, and su­ch.

Whi­le this wo­rks, hand­ling many co­rou­ti­nes, re­fac­tor ge­ne­ra­tor­s, and or­ga­ni­zin­g ­the co­de be­ca­me a bit cum­ber­so­me. Mo­re wo­rk had to be do­ne, if we wanted to­ keep a Py­tho­nic way of doing con­cu­rrent pro­gra­m­min­g.

More Coroutines

PE­P-380 added mo­re chan­ges to co­rou­ti­nes, this ti­me wi­th the goal of su­ppor­tin­g ­de­le­ga­tion to su­b-­ge­ne­ra­tor­s. Two main things chan­ged in ge­ne­ra­tors to make ­them mo­re use­ful as co­rou­ti­nes:

  • Ge­ne­ra­tors can now re­turn va­lues.
  • The yield from syn­ta­x.

Return Values in Generators

The ke­yword def, de­fi­nes a func­tio­n, whi­ch re­turns va­lues (wi­th the re­turn ke­ywor­d). Ho­we­ve­r, as stated on the first sec­tio­n, if that def con­tains a yield, is a ge­ne­ra­tor func­tion. Be­fo­re this PEP it would ha­ve ­been a syn­tax error to ha­ve a re­turn in a ge­ne­ra­tor func­tion (a func­tio­n ­that al­so has a yield. Ho­we­ve­r, this is no lon­ger the ca­se.

Re­mem­ber how ge­ne­ra­tors stop by rai­sing Sto­pI­te­ra­tion. What does it mean ­that a ge­ne­ra­tor re­turns a va­lue? It means that it stop­s. And whe­re does tha­t ­va­lue do? It’s con­tai­ned in­si­de the ex­cep­tio­n, as an attri­bu­te in Sto­pI­te­ra­tio­n.­va­lue.

def gen():
        yield 1
        yield 2
        return "returned value"

>>> g = gen()
>>> try:
...     while True:
...         print(next(g))
... except StopIteration as e:
...     print(e.value)
...
1
2
returned value

No­ti­ce that the va­lue re­tur­ned by the ge­ne­ra­tor is sto­red in­si­de the ex­cep­tio­n, in Sto­pI­te­ra­tio­n.­va­lue. This mi­ght sound like is not the most ele­gan­t ­so­lu­tio­n, but doing so, pre­ser­ves the ori­gi­nal in­ter­fa­ce, and the pro­to­co­l ­re­mains un­chan­ge­d. It’s sti­ll the sa­me kind of ex­cep­tion sig­na­lling the end of ­the ite­ra­tio­n.

yield from

Ano­ther syn­tax chan­ge to the lan­gua­ge.

In its most ba­sic for­m, the cons­truc­tion yield from <i­te­ra­ble>, can be­ ­thou­ght of as:

for e in iterable:
    yield e

Ba­si­ca­lly this means that it ex­ten­ds an ite­ra­ble, yiel­ding all ele­men­ts tha­t ­this in­ter­nal ite­ra­ble can pro­du­ce.

For exam­ple, this way we could crea­te a clo­ne of the iter­tool­s.­chain func­tion from the stan­dard li­bra­r­y.

>>> def chain2(*iterables):
...:     for it in iterables:
...:         yield from it

>>> list(chain2("hello", " ", "world"))
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

Ho­we­ve­r, saving two li­nes of co­de is not the rea­son why this cons­truc­tion wa­s a­dded to the lan­gua­ge. The rai­son d’e­tre of this cons­truc­tion is to ac­tua­ll­y ­de­le­ga­te res­pon­si­bi­li­ty in­to sma­ller ge­ne­ra­tor­s, and chain the­m.

>>> def internal(name, limit):
...:     for i in range(limit):
...:         got = yield i
...:         print(f"{name} got: {got}")
...:     return f"{name} finished"

>>> def gen():
...:     yield from internal("A", 3)
...:     return (yield from internal("B", 2))

>>> g = gen()
>>> next(g)
0
>>> g.send(1)
A got: 1
1

>>> g.send(1)   # a few more calls until the generator ends
B got: 1
------------------------------------------------------
StopIteration        Traceback (most recent call last)
... in <module>()
----> 1 g.send(1)
StopIteration: B finished

He­re we see how yield from hand­les pro­per de­le­ga­tion to an in­ter­na­l ­ge­ne­ra­to­r. No­ti­ce that we ne­ver send va­lues di­rec­tly to in­ter­nal, but to gen, ins­tea­d, and the­se va­lues end up on the nes­ted ge­ne­ra­to­r. Wha­t yield from is ac­tua­lly doing is crea­ting a ge­ne­ra­tor that has a chan­nel to­ a­ll nes­ted ge­ne­ra­tor­s. Va­lues pro­du­ced by the­se wi­ll be pro­vi­ded to the ca­lle­r of gen. Va­lues sent to it, wi­ll be pa­ss­ed along to the in­ter­nal ge­ne­ra­tor­s (­the sa­me for ex­cep­tion­s). Even the re­turn va­lue is hand­le­d, and be­co­mes the ­re­turn va­lue of the to­p-­le­vel ge­ne­ra­tor (in this ca­se the string that sta­tes ­the na­me of the last ge­ne­ra­tor be­co­mes the re­sul­ting Sto­pI­te­ra­tio­n.­va­lue).

We see now the real va­lue of this cons­truc­tio­n. Wi­th this, it’s ea­sier to­ ­re­fac­tor ge­ne­ra­tors in­to sma­ller pie­ce­s, com­po­se them and chain them to­ge­the­r whi­le pre­ser­ving the be­ha­viour of co­rou­ti­nes.

The new yield from syn­tax is a great step to­war­ds su­ppor­ting be­tte­r ­con­cu­rren­c­y. We can now thi­nk ge­ne­ra­tors as being “li­gh­twe­ight th­rea­d­s”, tha­t ­de­le­ga­te func­tio­na­li­ty to an in­ter­nal ge­ne­ra­to­r, pau­se the exe­cu­tio­n, so tha­t o­ther things can be com­puted in that ti­me.

Be­cau­se syn­tac­ti­ca­lly ge­ne­ra­tors are like co­rou­ti­nes, it was po­s­si­ble to­ ac­ci­den­ta­lly con­fu­se the­m, and end up pla­cing a ge­ne­ra­tor whe­re a co­rou­ti­ne would ha­ve been ex­pec­ted (the yield from would ac­cept it, after all). Fo­r ­this rea­so­n, the next step is to ac­tua­lly de­fi­ne the con­cept of co­rou­ti­ne as a ­pro­per ty­pe. Wi­th this chan­ge, it al­so fo­llo­wed that yield from evol­ve­d in­to await, and a new syn­tax for de­fi­ning co­rou­ti­nes was in­tro­du­ce­d: as­ync.

async def / await

A qui­ck no­te on how this re­la­tes to as­yn­ch­ro­nous pro­gra­m­ming in Py­tho­n.

On as­yn­cio, or any other event lo­op, the idea is that we de­fi­ne co­rou­ti­nes, and make them part of the event lo­op. Broad­ly speaking the event lo­op wi­ll kee­p a list of the ta­sks (whi­ch wrap our co­rou­ti­nes) that ha­ve to run, and wi­ll s­che­du­le them to.

On our co­rou­ti­nes we de­le­ga­te the I/O func­tio­na­li­ty we want to achie­ve, to so­me o­ther co­rou­ti­ne or awai­ta­ble ob­jec­t, by ca­lling yield from or await on it.

Then the event lo­op wi­ll ca­ll our co­rou­ti­ne, whi­ch wi­ll rea­ch this li­ne, ­de­le­ga­ting to the in­ter­nal co­rou­ti­ne, and pau­sing the exe­cu­tio­n, whi­ch gi­ve­s ­the con­trol ba­ck to the sche­du­ler (so it can run ano­ther co­rou­ti­ne). The even­t ­lo­op wi­ll mo­ni­tor the fu­tu­re ob­ject that wraps our co­rou­ti­ne un­til is fi­nis­he­d, and when it’s nee­de­d, it wi­ll up­da­te it by ca­lling the .sen­d() me­thod on i­t. Whi­ch in tur­n, wi­ll pa­ss along to the in­ter­nal co­rou­ti­ne, and so on.

Be­fo­re the new syn­tax for as­ync and await was in­tro­du­ce­d, co­rou­ti­nes we­re de­fi­ned as ge­ne­ra­tors de­co­ra­ted wi­th as­yn­cio­.­co­rou­ti­ne (ty­pes.­co­rou­ti­ne was added in Py­thon 3.5, when the co­rou­ti­ne ty­pe itsel­f was create­d). No­wa­da­ys, as­ync def crea­tes a na­ti­ve co­rou­ti­ne, and in­si­de i­t, on­ly the await ex­pres­sion is ac­cep­ted (not yield from).

The fo­llo­wing two co­rou­ti­nes step and co­ro are a sim­ple exam­ple, of ho­w await wo­rks si­mi­lar to yield from de­le­ga­ting the va­lues to the in­ter­nal ge­ne­ra­to­r.

>>>  @types.coroutine
...: def step():
...:     s = 0
...:     while True:
...:         value = yield s
...:         print("Step got value ", value)
...:         s += 1

>>>  async def coro():
...:     while True:
...:         got = await step()
...:         print(got)


>>> c = coro()
>>> c.send(None)
0
>>> c.send("first")
Step got value  first
1

>>> c.send("second")
Step got value  second
2

>>> c.send("third")
Step got value  third
3

On­ce agai­n, like in the yield from exam­ple, when we send a va­lue to co­ro, this rea­ches the await ins­truc­tio­n, whi­ch means that wi­ll pa­ss to­ ­the step co­rou­ti­ne. In this sim­ple exam­ple co­ro is so­me­thing like wha­t we would wri­te, whi­le step would be an ex­ter­nal func­tion we ca­ll.

The fo­llo­wing two co­rou­ti­nes are di­ffe­rent wa­ys of de­fi­ning co­rou­ti­nes.

# py 3.4
@asyncio.coroutine
def coroutine():
    yield from asyncio.sleep(1)

# py 3.5+
async def coroutine():
    await asyncio.sleep(1)

Ba­si­ca­lly this means that this as­yn­ch­ro­nous way of pro­gra­m­ming is kind of like an API, for wo­rking wi­th event lo­op­s. It does­n’t rea­lly re­la­te to as­yn­cio, we could use any event lo­op (cu­rio, uv­lo­op, etc.), for this. The im­por­tant part is to un­ders­tan­d, that an event lo­op wi­ll ca­ll our co­rou­ti­ne, whi­ch wi­ll even­tua­lly rea­ch the li­ne whe­re we de­fi­ned the await, and this wi­ll de­le­ga­te the func­tion to an ex­ter­nal func­tion (in this ca­se as­yn­cio­.s­leep). When the event lo­op ca­lls sen­d(), this is al­so pa­ss­e­d, and the await gi­ves ba­ck con­trol to the event lo­op, so a di­ffe­ren­t ­co­rou­ti­ne can run.

The co­rou­ti­nes we de­fi­ne are the­re­fo­re in be­tween the event lo­op, and 3r­d-­par­ty ­func­tions that know how to hand­le the I/O in a no­n-­blo­cking fas­hio­n.

The event lo­op wo­rks then by a chain of await ca­ll­s. Ul­ti­ma­te­l­y, at the en­d of that chain the­re is a ge­ne­ra­to­r, that pau­ses the exe­cu­tion of the func­tio­n, and hand­les the I/O.

In fact if we che­ck the ty­pe of as­yn­cio­.s­leep, we’­ll see that is in­deed a ­ge­ne­ra­to­r:

>>> asyncio.sleep(1)
<generator object sleep at 0x...>

So wi­th this new syn­ta­x, does this mean that await is like yield from?

On­ly wi­th res­pect to co­rou­ti­nes. It’s co­rrect to wri­te await <co­rou­ti­ne>, as we­ll as yield from <co­rou­ti­ne>, the for­mer wo­n’t wo­rk wi­th othe­r ite­ra­bles (for exam­ple ge­ne­ra­tors that aren’t co­rou­ti­nes, se­quen­ce­s, etc.). ­Con­ver­se­l­y, the la­tter wo­n’t wo­rk wi­th awai­ta­ble ob­jec­ts.

The rea­son for this syn­tax chan­ge is for co­rrec­tness. Ac­tua­lly it’s not just a s­yn­tax chan­ge, the new co­rou­ti­ne ty­pe is pro­per­ly de­fi­ne­d.:

>>> from collections import abc
>>> issubclass(abc.Coroutine, abc.Awaitable)
True

Gi­ven that co­rou­ti­nes are syn­tac­ti­ca­lly like ge­ne­ra­tor­s, it would be po­s­si­ble ­to mix the­m, and pla­ce a ge­ne­ra­tor in an as­yn­ch­ro­nous co­de whe­re in fact we ex­pec­ted a co­rou­ti­ne. By using await, the ty­pe of the ob­ject in the ex­pres­sion is che­cked by Py­tho­n, and if it does­n’t com­pl­y, it wi­ll rai­se an ex­cep­tio­n.

Asynchronous Generators

In Py­thon 3.5 not on­ly the pro­per syn­tax for co­rou­ti­nes was added (as­ync de­f / await), but al­so the con­cept of as­yn­ch­ro­nous ite­ra­tor­s. The idea of ha­vin­g an as­yn­ch­ro­nous ite­ra­ble is to ite­ra­te whi­le run­ning as­yn­ch­ro­nous co­de. Fo­r ­this new me­tho­ds su­ch as __ai­te­r__ and __a­nex­t__ whe­re added un­der the ­con­cept of as­yn­ch­ro­nous ite­ra­tor­s.

Ho­we­ver the­re was no su­pport for as­yn­ch­ro­nous ge­ne­ra­tor­s. That is ana­lo­gous to­ s­a­ying that for as­yn­ch­ro­nous co­de we had to use ite­ra­bles (like __i­te­r__ / __­nex­t__ on re­gu­lar co­de), but we could­n’t use ge­ne­ra­tors (ha­ving a yield in an as­ync def func­tion was an erro­r).

This chan­ged in Py­thon 3.6, and now this syn­tax is su­pporte­d, wi­th the se­man­ti­cs of a re­gu­lar ge­ne­ra­tor (la­zy eva­lua­tio­n, sus­pend and pro­du­ce one e­le­ment at the ti­me, etc.), whi­le ite­ra­tin­g.

Con­si­der this sim­ple exam­ple on whi­ch we want to ite­ra­te whi­le ca­lling so­me I/O­ ­co­de that we do­n’t want to blo­ck upo­n.

async def recv(no, size) -> str:
    """Simulate reading <size> bytes from a remote source, asynchronously.
    It takes a time proportional to the bytes requested to read.
    """
    await asyncio.sleep((size // 512) * 0.4)
    chunk = f"[chunk {no} ({size})]"
    return chunk


class AsyncDataStreamer:
    """Read 10 times into data"""
    LIMIT = 10
    CHUNK_SIZE = 1024

    def __init__(self):
        self.lecture = 0

    def __aiter__(self):
        return self

    async def __anext__(self):
        if self.lecture >= self.LIMIT:
            raise StopAsyncIteration

        result = await recv(self.lecture, self.CHUNK_SIZE)
        self.lecture += 1
        return result

async def test():
    async for read in AsyncDataStreamer():
        logger.info("collector on read %s", read)

The test func­tion wi­ll sim­ply exer­ci­se the ite­ra­to­r, on whi­ch ele­men­ts are ­pro­du­ce­d, one at the ti­me, whi­le ca­lling an I/O ta­sk (in this exam­ple as­yn­cio­.s­leep).

Wi­th as­yn­ch­ro­nous ge­ne­ra­tor­s, the sa­me could be rew­ri­tten in a mo­re com­pac­t wa­y.

async def async_data_streamer():
    LIMIT = 10
    CHUNK_SIZE = 1024
    lecture = 0
    while lecture < LIMIT:
        lecture += 1
        yield await recv(lecture, CHUNK_SIZE)

Summary

It all started wi­th ge­ne­ra­tor­s. It was a sim­ple way of ha­ving la­zy com­pu­ta­tio­n in Py­tho­n, and run­ning mo­re effi­cient pro­gra­ms, that use le­ss me­mo­r­y.

This evol­ved in­to co­rou­ti­nes, taking ad­van­ta­ge of the fact that ge­ne­ra­tors can ­sus­pend their exe­cu­tio­n. By ex­ten­ding the in­ter­fa­ce of ge­ne­ra­tor­s, co­rou­ti­nes ­pro­vi­ded mo­re po­wer­ful fea­tu­res to Py­tho­n.

Co­rou­ti­nes we­re al­so im­pro­ved to su­pport be­tter pa­ttern­s, and the addi­tion of yield from was a ga­me chan­ge­r, that allo­ws to ha­ve be­tter ge­ne­ra­tor­s, ­re­fac­tor in­to sma­ller pie­ce­s, and reor­ga­ni­ze the lo­gic be­tte­r.

The addi­tion of an event lo­op to the stan­dard li­bra­r­y, helps to pro­vi­de a ­re­fe­ren­tial way of doing as­yn­ch­ro­nous pro­gra­m­min­g. Ho­we­ve­r, the lo­gic of the ­co­rou­ti­nes and the await syn­tax it not bound to any par­ti­cu­lar event lo­op. I­t’s an API [2] for doing as­yn­ch­ro­nous pro­gra­m­min­g.

As­yn­ch­ro­nous ge­ne­ra­tor was the la­test addi­tion to Py­thon that re­la­tes to­ ­ge­ne­ra­tor­s, and they help build mo­re com­pact (and effi­cien­t!) co­de fo­r a­s­yn­ch­ro­nous ite­ra­tio­n.

In the en­d, be­hind all the lo­gic of as­ync / await, eve­r­y­thing is a ­ge­ne­ra­to­r. Co­rou­ti­nes are in fact (te­ch­ni­ca­ll­y), ge­ne­ra­tor­s. Con­cep­tua­lly the­y a­re di­ffe­ren­t, and ha­ve di­ffe­rent pur­po­ses, but in ter­ms of im­ple­men­ta­tio­n ­ge­ne­ra­tors are what make all this as­yn­ch­ro­nous pro­gra­m­ming po­s­si­ble.

Slides

References

Notes

[1] Needless to say, the results will vary from system to system, but we get an idea of the difference between both implementations.
[2] This is an idea by David Beazley, that you can see at https://youtu.be/ZzfHjytDceU