Summary
You can create a very peculiar object using parenthesis on a comprehension:
>>> squares = (x * x for x in range(10))
>>> squares
<generator object <genexpr> at 0x7ff553997bc0>
Generators are lazy python iterables that create elements one by one, on demand, yet don't hold them in memory.
They can be created with another syntax, the yield
keyword:
def squares(n=10):
for x in range(n):
yield x * x
But using yield
is tricky, as it's using the function syntax, despite the fact you are not writing a function at all.
Still, it allows for powerful and complex algos to be hidden behind a simple for loop:
>>> for n in squares(5):
... print(n)
...
0
1
4
9
16
This keyword can be used to make any object iterable as well:
>>> class Car:
...
... def __init__(self, gaz):
... self.gaz = gaz
...
... def __iter__(self):
... while self.gaz:
... yield self.gaz
... self.gaz -= 1
... print("Vrooom")
...
>>> for remaining_gaz in Car(5):
... print(f"There is {remaining_gaz}l in the tank")
...
There is 5l in the tank
Vrooom
There is 4l in the tank
Vrooom
There is 3l in the tank
Vrooom
There is 2l in the tank
Vrooom
There is 1l in the tank
Vrooom
Since generators are just iterables, you can use them with the same mindset that any iterable: by piping things to each other.
Let's rewind a little
Since this is a series of articles, I'll assume you've read what comes before. In the previous post, we introduced the philosophy behind Python iteration, and we briefly talked about how comprehensions worked.
Indeed, instead of:
>>> anniversary_presents = ["auryn", "the heart of the ocean", "ring of free action"]
>>> titled_presents []
>>> for present in anniversary_presents:
... titled_presents.append(present.title())
>>> titled_presents
['Auryn', 'The Heart Of The Ocean', 'Ring Of Free Action']
You can do:
>>> titled_presents = [present.title() for present in anniversary_presents]
>>> titled_presents
['Auryn', 'The Heart Of The Ocean', 'Ring Of Free Action']
If the syntax is a bit twisted for you, think of it this way:
We take this from the long version of the script:
titled_presents = []
We add the for
loop in the middle (without the colon):
titled_presents = [for present in anniversary_presents]
Then we take what's INSIDE the .append()
, and put it on the left of the for
. This is what produces the transformation for each element:
titled_presents = [present.title() for present in anniversary_presents]
So the long version reads like "for each present, make a title, and add it to the list".
But the comprehension version reads like "create a list out of titles made from each present".
The "put everything in the list" part is actually optional because we have…
The different types of comprehensions
You can create a list with [1, 2, 3]
, so it seems logical that using brackets in a comprehension makes it create a list. But you can, actually, create other things.
Indeed, if you use {1, 2, 3}
in Python, you would create a set. And if you use curly braces in a comprehension, this is also what you would get:
>>> titled_presents = {present.title() for present in anniversary_presents}
>>> type(titled_presents)
<class 'set'>
Now, if you put key/value pairs between our mustaches, like {"a": 1, "b": 2, "c": 3}
, you would get a dictionary. And again, with comprehension, it's the same:
>>> titled_presents = {p.title(): f"${i} 000 000 000" for i, p in enumerate(anniversary_presents, 1)}
>>> titled_presents
{
'Auryn': '$1 000 000 000',
'The Heart Of The Ocean': '$2 000 000 000',
'Ring Of Free Action': '$3 000 000 000'
}
Although, those are overpriced if you ask me. Unless you get to ride Falcor.
But there is one thing that doesn't match our expectations, and that's the one with parentheses. Do (1, 2, 3)
and you get a tuple. But do this in a comprehension and you get...
>>> titled_presents = (present.title() for present in anniversary_presents)
>>> type(titled_presents)
<class 'generator'>
A what, now?
Making things at the last minute
Generators a very strange yet powerful iterables. They are a bit like CS students: they get only get works done at the very last minute, when it's required.
First, they have no length, and you can't get any element at a particular index:
>>> len(titled_presents)
...
TypeError: object of type 'generator' has no len()
>>> titled_presents[0]
...
TypeError: 'generator' object is not subscriptable
That's because generators don't contain any element. In fact, right now, no calculation has been even performed on the presents
. None of them have been called with .title()
at this stage.
Generators are iterables, yes. But they are lazy iterables, meaning they only start to run their code when you read them:
>>> for present in titled_presents:
... print(present)
...
Auryn
The Heart Of The Ocean
Ring Of Free Action
Generators produce one element at a time, and then forget about it. They don't store anything. If you try to read it again, nothing will happen:
>>> for present in titled_presents:
... print(present)
...
>>>
To run the code one more time, we need to create a new generator. You can store the generator values by turning it into another data structure, like a list:
>>> titled_presents = (present.title() for present in anniversary_presents)
>>> saved_titled_presents = list(titled_presents)
>>> saved_titled_presents
['Auryn', 'The Heart Of The Ocean', 'Ring Of Free Action']
>>> list(titled_presents) # can't read it twice!
[]
As you know now, all iterables come with their iterator, and a for
loop is just calling next()
on it. It's true for generators too:
>>> titled_presents = (present.title() for present in anniversary_presents)
>>> generator_iterator = iter(titled_presents)
>>> next(generator_iterator)
'Auryn'
>>> next(generator_iterator)
'The Heart Of The Ocean'
>>> next(generator_iterator)
'Ring Of Free Action'
Every time you call next()
, the generator forget all about the previous element, calculate the next element on demand, and returns it.
Nothing is stored. Nothing is produced before you ask for it.
For this reason, generators are very nifty tools: they can save memory if you don't need to have all elements loaded together. And they can save CPU if you only need to work on some elements, and not the whole bag.
Introducing yield
Using a comprehension to create generators is nice, but limited. However, there is another way to create them. The weirdest Python keyword ever, yield
.
First, it's weird, because it's pronounced like "weird". And I didn't know that. I wrongly pronounced "yield
, "yeah eld" with my terrible French accent for years. I gave a talk at a Pycon conf on iteration where I loudly insisted on saying it that way again and again for an hour. Not sure you can use this information, but I figured I'd shared.
Second, it's weird because it changes what the code around it does. There is no other keywords that does that, except __future__
imports, but that’s their purpose.
That's why yield
is hard to grasp for beginners. It's super confusing.
Take this function:
def hammers():
print('Here is a list of my famous hammers:')
return "Moradin's"
print("Cause I love Baldur's Gate")
return "Grabthar's"
print("What a saving!")
return "Mjolnir"
print("Since that's the only one most people know about")
It's simple, nice and easy. You call it, it runs. Full stop:
>>> print(hammers())
Here is a list of my famous hammers:
Moradin's
In fact, most of it is dead code. After the first return
, it exits. No Alan Rickman for you.
But now look at this one. I just change all the return
with yield
:
def hammers():
print('Here is a list of my famous hammers:')
yield "Moradin's"
print("Cause I love Baldur's Gate")
yield "Grabthar's"
print("What a saving!")
yield "Mjolnir"
print("Since that's the only one most people know about")
And...
>>> print(hammers())
<generator object hammers at 0x7ff5514d3f40>
WTF?
Honey, it's not what you think
Notice how "Here is a list of my famous hammers" was not even printed?
But it's the first line of the damn def
block!
That's because the code inside the def
doesn't run. At all.
See that's what I was talking about: introducing yield
changed everything.
When a def
contains a yield
, it's not a function anymore.
Forget everything you know about functions, because it's not one:
It doesn't run when you call it. Nope.
Instead it fabricates a new generator object, and returns that.
Unlike
return
, you can reach severalyield
.
So putting a yield
in there modified the very nature of the code you are writing around it. It's actually a design error, in my opinion, and I talk about this problem in the Python Xmas list. They should have done it like they did async
/await
.
But let's come back to this generator. As I told you a few lines above, generators only start to run once you read them. This one is no exception.
The first time you (or a for
loop, remember this is just an explanation, you will rarely call next()
manually) ask for one element, it will execute from the def
to the first yield
and output its value:
>>> print(next(iterator))
Here is a list of my famous hammers:
Moradin's
Then, something peculiar happens.
The generator freezes everything inside! It stops at the first yield
and remembers it's there. All variables are saved too. The generator becomes its own little frozen world, with its own little memory space.
Then you call next()
again:
>>> print(next(iterator))
Cause I love Baldur's Gate
Grabthar's
The generator wakes up. It unfreezes everything. It starts back right after the first yield
this time. And it runs the code until the next yield
.
Then it goes back to sleep. Saves everything again, position and state.
Then, you call next()
again:
>>> print(next(iterator))
What a saving!
Mjolnir
Wakes up. Runs code from second yield
to the last yield
. Freezes again.
Finally, you call next()
one last time:
>>> print(next(iterator))
Since that's the only one most people know about
...
StopIteration
And it unfreezes, starts from the last yield
, and has nothing more to do. So it raises StopIteration
, like all iterators that don't have any more elements to give you.
As I said, you will rarely use it that way. What you will do, is put it in a for
loop:
>>> for hammer in hammers():
... print(hammer)
...
Here is a list of my famous hammers:
Moradin's
Cause I love Baldur's Gate
Grabthar's
What a saving!
Mjolnir
Since that's the only one most people know about
Why this instead of a comprehension?
This looks like a comprehension with extra steps, why would you use it?
Because it can supports arguments and can containg much more complicated code.
E.G, generate all Monday in a month and year:
import calendar
import datetime as dt
def get_mondays(year, month):
_, end = calendar.monthrange(year, month)
for day in range(1, end + 1):
if calendar.weekday(year, month, day) == calendar.MONDAY:
yield dt.date(year, month, day)
The code is doing some work, but from the user side, it looks like a simple for
loop:
>>> for date in get_mondays(2023, 7):
... print(date)
...
2023-07-03
2023-07-10
2023-07-17
2023-07-24
2023-07-31
get_mondays(2023, 7)
returns a generator. The for
loop calls iter()
on it, and then call next()
again and again. Every time it does, the code activates. It may loop several times internally before reaching the yield
, because of the if
, but eventually it does. It then outputs the date and freezes. The subsequent next()
starts from the yield
, goes up to the internal for
loop, makes a few turns, and reaches the yield
.
The magic is that all variables are stored, and the position of the last yield
is saved as well. This generator is its own little complex universe, with an end
, a year
and month
variables all internally saved. When it stops, it knows where it is. Where it stats back, it knows from where.
Every time you call get_mondays()
, you create a new generator, and hence a new little universe, with its own internal state and position, separated from all the other generators.
The beauty of generators is that they are regular iterables. So you can use them as pluggable pipes too.
Let's say I want all the Monday of all the months for a hundred years:
def generate_mondays_over_a_hundred_years(start_year):
for year in range(start_year, start_year + 100):
for month in range(1, 13):
for monday in get_mondays(year, month):
yield monday
If I'd do:
>>> for monday in list(generate_mondays_over_a_hundred_years(2023))[:5]:
... print(monday)
2023-01-02
2023-01-09
2023-01-16
2023-01-23
2023-01-30
I'd print 5128 dates, but only a single date would remain in memory at every given time.
Using generators everywhere instead of lists means you never accumulate anything, your connected pipes just process one element through the whole pipeline at a given time.
In fact, it's so convenient there is a shortcut for it, yield from
:
def generate_mondays_over_hundred_years(start_year):
for year in range(start_year, start_year + 100):
for month in range(1, 13):
yield from get_mondays(year, month):
Technically, yield from
works with any iterable, but it respects the lazy nature of generators. And they deserve respect. And love.
Making anything iterable
Python being Python, you can imagine there is a dunder method waiting to turn any random object into something iterable. And you would be right, it’s __iter__
, which, as you expect, is really what iter()
calls under the hood.
E.G, you can make a traffic light that contains all info about how it behaves:
import time
class TrafficLight:
def __init__(self, tempo, top_color, mid_color, bottom_color):
self.top_color = top_color
self.mid_color = mid_color
self.bottom_color = bottom_color
self.night = False
self.tempo = tempo
def night_signal(self):
yield self.mid_color
time.sleep(self.tempo / 2)
yield self.mid_color
time.sleep(self.tempo / 2)
yield self.mid_color
def day_signal(self):
yield self.top_color
time.sleep(self.tempo)
yield self.mid_color
time.sleep(self.tempo)
yield self.bottom_color
def __iter__(self):
if self.night:
yield from self.night_signal()
else:
yield from self.day_signal()
But you can loop on it to get the signals:
>>> for signal in TrafficLight(1, "red", "orange", "green"):
... print(signal)
...
red
<one second wait>
orange
<one second wait>
green
Tips and tricks
Generators are just iterables. You can use
sum()
,all()
,zip()
, etc. on them! Even unpacking!There is a whole stdlib module friendly to generators: itertools. It's packed with nice utilities that are all returning themselves a generator, preserving the laziness of your pipes.
You can send things inside generators with the
.send() methods
. They are not just for pulling stuff out of it, you can push too. But it's quite complicated to use, so I don't recommend it unless you really know what you are doing.numpy really doesn't like iteration. If you use
numpy
, keep yourfor
loops and youyield
away from it.Generators are their own iterator:
>>> gen = hammers()
>>> iter(gen) is gen
True
You don't really need to call iter()
on them. But since other iterables need it, this maintains a consistent code.