Summary
This article contains nothing new. In fact, those things have been talked about everywhere on the internet for the last 15 years. Yet, I still see them in the wild, so I decided to add one more reminder.
# -*- coding: utf-8 -*-
is obsolete, and this syntax is not what most users should use anywayGood use cases for
len()
exist, but they are rare.Tracking things in loops is often better delegated to the language.
Just use
in
to check for inclusion.Getters and setters for lone attributes are not idiomatic.
# -- coding: utf-8 --
This first line is doubly unnecessary.
First, it's unnecessary in Python 3 because UTF-8 is the default encoding, so you don't need to specify it. Second, if your file is in another encoding than utf8 (which I would not recommend, but could happen), let's say "latin1", then you can use the much simpler syntax # coding: NAME_OF_ENCODING
.
E.G.:
# coding: latin1
So why did we use to need this line? Because in Python 2, the default encoding was ASCII, and you had to change that if you wanted to use non-ASCII characters in your source code. Like, if I want to write my name in a comment in a file, it contains a French é
, which is not ASCII, so Python would crash.
Yep. For my name. In a comment.
There is not such things as raw text, remember?
Using utf8 is today pretty much standard, so Python 3 made it the default value, and therefore you don't need to specify it.
Note that this comment doesn't make your file utf8, your editor is in charge of that. It just tells Python you used utf8, so that Python can load the file properly.
Also, if:
# coding: utf8
works, why then people uses:
# -*- coding: utf-8 -*-
?
Well # -*- coding: utf-8 -*-
also works. In fact, anything that is matching the following regex will work:
^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)
So:
# encoding = utf-8
would work. So would:
# I love coding: utf-8 !!!
This feature was intended so that it would also be used by editors like emacs or vim that understand similar types of comments. It just happens that the syntax for emacs is:
# -*- coding: utf-8 -*-
And one popular Python tutorial 20 years ago was written with this header by an emacs user, which then got copied/pasted everywhere by others without thinking much about it.
Humans are funny creatures and their creations are a product of their quirks.
Too much len()
There are good use cases for len()
, but those are not it:
>>> flowers = ['rose', 'tulip', 'ramona']
... for i in range(len(flowers)):
... print(flowers[i])
...
rose
tulip
ramona
This is rare (and will get rarer thanks to ChatGPT), but you still see that sometimes from people coming from other languages.
Because of how iteration works, here is the natural way to do it in Python:
for f in flowers:
print(f)
In the same way, to check if flowers
is empty, there is no need for:
if len(flowers) == 0:
In Python, any object can be evaluated as a boolean. By default, if it's empty, it's False
, otherwise it's True
.
So:
if not flowers:
Is all you need to do.
Do Not Track
Manually tracking things, like number of loop turns, counters and found variable is sometimes also not required in Python.
The typical example is limiting to a number of loop turns:
>>> i = 0
... while i < 10:
... print(i)
... i += 1
...
0
1
2
3
4
5
6
7
8
9
It is usually expressed idiomatically with:
for i in range(10):
print(i)
range()
is quite sophisticated. It can accept a start value, an end value, but also a step value:
>>> for i in range(5, 20, 3):
... print(i)
...
5
8
11
14
17
And checking if a number is in the range is very fast, orders of magnitude than with a list:
>>> numbers = range(5, 10000, 3)
>>> %timeit 15 in numbers
49.7 ns ± 0.738 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> numbers = list(range(5, 10000, 3))
>>> %timeit 15 in numbers
13.5 µs ± 301 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Similarly, attaching an index to each element you iterate on with:
>>> for i in range(len(flowers)):
... print(i + 1, flowers[i])
...
1 rose
2 tulip
3 ramona
Is better done with:
for i, flow in enumerate(flowers, 1):
print(i, flow)
There is a lesser-known one, which is about tracking if you have found something.
Let's try to find a flower that is ending with the letter "z":
>>> found = False
... for f in flowers:
... if f.endswith('z'):
... found = True
... print('Found it!')
... break
...
... if not found:
... print('Not found :(')
...
Not found :(
Well, this is what the keyword else
is for. Indeed, else
works with if
, but surprisingly also with while
and for
, with a different effect: it's activated if the loop never reached break
:
for f in flowers:
if f.endswith('z'):
print('Found it!')
break
else:
print('Not found :(')
I'm in!
If all you need to know is if something is contained in something else, like a word in a sentence, an element in a list/tuple/set, a key in a dictionary, etc., the in
keyword should be the first thing to reach for.
E.G:
>>> "ramona".index('a') != -1
True
Is better written:
>>> "a" in "ramona"
True
getters and setters
Strangely, this one is still quite common. I'm working on a project right now, where I had to help my colleagues come out of their habit of creating methods to get or set the value of an attribute.
Just like variables, attributes are dumb in Python. They can't be final, static, virtual or private. They are just a name we set on something.
And no, prefixing with _
or __
don't make anything private. The first one is a convention that will inform users and tooling of your intent do not fulfill any stability guarantee. The second is an artifact of Python inheritance system that performs attribute name mangling, but the value is still accessible through __dict__
under a different name.
So the culture in Python is just to access the attributes directly. You have an age
attribute on a person ? Don't put a get_age()
method to access it. Let others access it directly.
Getters and setters have a place, but they need to perform some kind of logic, other than accessing the variable, like calculating the age from the date of birth, or checking if you have the permission to access this information from a database.
Another reason to use getters and setters is when you need a parameter. E.G: if to know the age of something, you need to pass the calendar in which to calculate the age from the date of birth, then yes, use a getter.
But if you need only to get the attribute, don't bother.
Now, what if in the future, we want to add a bit of logic? Wouldn't it be safer to make a getter already, so that we don't break compatibility in the future?
There are two answers to that.
If the logic requires arguments, you will break the compat anyway even if you already have a getter and a setter, so it doesn't add much value.
If the logic requires no argument, then Python provides a tool for this, the @property
decorator.
You would then go from:
@dataclass
class Person:
age: int
To:
class Person:
@property
def age() -> int:
...
This allows to add logic, but the rest of the world would still access it like an attribute, so no breaking necessary.
Of course, there is a debate to be had about the performance implications, but then I would say if your performance profile changes that much on a method, it would be breaking compat as well. The API contract is not the only thing that matters.
I disagree with some of these (the len bit most strongly) but I appreciate seeing people talk about them.
I agree with most of these, but I don't like for/else. I rarely see it in the wild, and whenever I do, I have to pause and think about what it does. I think the flag pattern is clearer.