Catch up with what's good from Python 3.6 to 3.11

And you don't even have to use most of it

May 02, 2023

String interpolation
Type hints on variables
Ordered dicts by default
The secrets module to safe random generation
A new number format
Making asyncio usable
Data classes
datetime.fromisoformat()
Lazy type annotations
The walrus operator
Positional only arguments
Debugging with f-string
Merging dicts
Type Hinting Generics In Standard Collections
Stable annual release cadence
String methods to remove prefixes and suffixes
Stuff you shouldn't use
Better error messages
Pattern matching
Type Unions
zip(strict=True)
OpenSSL 1.1.1 is required
Keyword arguments on dataclasses
22% better perfs
Exception groups and except*
The tomllib module
glob can filter directories

Time flies

It seems that last week they were talking about migrating from Python 2 to Python 3, and today we are on Python 3.11.

Despite loving to read each new change log and making a point in testing release candidates, I understand a huge part of the devs have not time nor desire for that.

They have stuff to code, and lives to live.

So this article is going to go through all the releases from 3.6 to 3.11 and list what I think you should know about each of them.

I won't list everything. In fact, I won't list most things.

Some major changes will be skipped completely, while we will talk about things that seem minor, because we believe a lot of people will find that useful or interesting.

Yes asynchronous generators and regex possessive quantifiers are cool, but I have a hunch most readers are more interested in string formatting.

Python 3.6

While 3.6 has reached end-of-life and therefore is not supported anymore, it brought significant changes to Python syntax and I'm still meeting people that never heard of them.

The most popular is far:

The f-strings

While PEP 498 is technically titled "Literal String Formatting", nobody calls them that way. "f-string" has won the battle of names.

f-strings are Python's way of doing string interpolation, a very convenient syntax to format string. Forget about %s or .format(), you can now use variable names directly into strings literals if you use the f prefix:

>>> product = "bread"
>>> price = 10
>>> print(f"The {product} costs ${price}")
The bread cost $10

You can use code in the curly brackets:

>>> product = "bread"
>>> price = 10
>>> print(f"2 {product.upper()}s costs ${price * 2}")
2 breads cost $20

And even .format()'s mini formatting language:

>>> product = "bread"
>>> price = 10
>>> print(f"The {product} costs ${price:.2f}")
The bread costs $10.00

It's really nice for dates:

>>> import datetime as dt
>>> print(f"Today is {dt.date.today():%m/%d/%Y}")
Today is 05/01/2023

Type hints on variables

Since Python 3, you can put annotations on function parameters. In theory, you can put anything in there. In practice, the community mostly uses it to put information about the types of the expected arguments.

With 3.6, this ability extends to regular variables, so you can do:

price: float = 10

Remember the following, though:

This does nothing to your code. Python will not use the information to do anything, it will just store it. It's executable documentation.
This is completely optional. And even if you use it, you don't have to use it in all the code. A few places is alright. Or not at all.
You will only benefit from it if you use an editor with good support for type hints (such as PyCharm, VSCode, etc) or a type checker (like mypy).

Ordered dicts by default

I'm cheating, since Python 3.7 is when this change has been officially accepted. But practically, CPython 3.6 already comes with it.

In Python 3.5 and bellow, if you created a dictionary, the order of insertion was not guaranteed:

>>> users = {}
>>> users['Zulu'] = 1
>>> users['Alpha'] = 2
>>> users
{'Alpha': 2, 'Zulu': 1}

And now it is:

>>> users = {}
>>> users['Zulu'] = 1
>>> users['Alpha'] = 2
>>> users
{'Zulu': 1, 'Alpha': 2}

This means that collections.OrderedDict is rarely useful nowadays.

The secrets module

Not a secret module, but a module named "secrets".

Python ships with an excellent "random" module, to generate pseudo-random data. While it's excellent for making games or shuffling playlists, it's been used in the past to generate keys and passwords.

That's a problem, because the "random" module is not designed for that at all, and it's a security risk.

There are better alternatives to install, but "random" is in the stdlib, so it's tempting to use it. To solve that problem, a cryptographically secure "secrets" module has been added:

>>> import secrets
>>> secrets.token_hex()
'9839d9d3a240ff39e0dc0a449bb8378b92cf1a8f6e0d626a397cb280ce4723c1'
>>> secrets.choice(["Schrödinger", "Pavlov"]) + " Team"
'Schrödinger Team'
>>> secrets.choice(["Schrödinger", "Pavlov"]) + " Team"
'Pavlov Team'

A new number format

Although it's not a very well-known feature, Python could always let you write number literals in different ways. E.G., you can write the number "35" in different bases:

>>> 0x23
35
>>> 0o43
35
>>> 0b100011
35
>>> 0x23 == 0b100011 == 0o43 == 35
True

Python 3.6 just add some readability into the mix, letting you insert a "_" in any number to make some part of it stand up. E.G., if you want to separate the thousands:

>>> 1_000_000_000
1000000000

This doesn't change anything to the value, it's just a visual thing you may use in your code to make numbers clearer.

Python 3.7

At the time this article is being written, 3.7 is the oldest supported version of Python.

The release gets the unofficial title of:

The version where asyncio is usable

asyncio, the stdlib module that lets you avoid waiting on your network card, was introduced in Python 3.4.

Yet it took 3 versions to mature into something you could use in production without getting a furious desire to go to live deep inside Amazonia with natives.

Python 3.7 brings the following changes that make this possible:

You can start the loop with asyncio.run(). Before that you either did it wrong or implemented some version of this monstrosity. Most people just got it wrong, though, and generally suffered a lot.
You can add things to contextvars, a container that is specific to the coroutine and thread you are now. Suddenly, all the web frameworks could save a connection to a database per context easily. Given you use asyncio mostly for the web, that was kinda useful.
You have the asyncio.create_task() shortcut that replaces the confusing asyncio.ensure_future().
You have asyncio.get_running_loop() that not only gives you the loop you are currently in (there used to be a bug preventing that, needless to say those were not good times), but will not create one behind your back like asyncio.get_running_loop().
TCP_NODELAY is on by default on linux. This could mean nothing to you, but it's 30x times faster.
async and await are now reserved keywords. You cannot by mistake create a variable named await.

Bottom line, don't use asyncio before 3.7. If you have the choice, wait for 3.8 which have a few more goodies (although I won't list them in this post).

Data classes

Writing classes in Python is verbose:

>>> class Bender:
...
...     def __init__(self, metal, percentage):
...         self.metal = metal
...         self.percentage = percentage
...
...     def __repr__(self):
...         return f"Bender(metal={self.metal!r}, percentage={self.percentage})"
...
>>> Bender('zinc', 40)
Bender(metal='zinc', percentage=40)

Dataclasses make that way less verbose:

>>> from dataclasses import dataclass
>>> @dataclass
... class Bender:
...     metal: str
...     percentage: float
...
>>> Bender('dolomite', 40)
Bender(metal='dolomite', percentage=40)

Data classes can do more than this. They can generate comparison methods, accept a factory for complicated values, set __stots__ automatically, among other things.

Still, their best feature is to be "classes, but shorter".

Keep in mind data classes are pretty slow, but I've yet come across a code base where they were the bottle neck, and I use them pretty liberally.

datetime.fromisoformat()

This seems a minor feature to be excited about, but I guarantee more coders will be enjoying this one than whatever TCP_NODELAY is bringing on the table.

datetime.fromisoformat() is a shortcut to parse dates in the format "YYYY-DD-YY...":

>>> import datetime as dt
>>> str(t.date.today())
'1985-10-26'
>>> dt.datetime.fromisoformat('1985-10-26')
datetime.datetime(1985, 10, 26, 0, 0)
>>> str(dt.datetime.now())
'1985-10-26 01:24:39.167076'
>>> dt.datetime.fromisoformat('1985-10-26 01:24:39.167076')
datetime.datetime(1985, 10, 26, 1, 24, 39, 167076)

It's also the start of beautiful time-related contributions from Paul Ganssle, but we won't list them in this article.

Annotations from the future

For compatibility purposes, Python has the __future__ imports. Those are imports you put at the top of the file to opt it for a behavior that will become the default one in future versions.

E.G., in Python 2.7, you could do:

>>> from __future__ import print_function

To turn the print keyword into a print() function like in Python 3.

3.7 adds from __future__ import annotations as an option to make the type annotation of the current file lazily evaluated.

It's easier to understand with an example. Let's say I make a useless type hint that contains a print (yes, we can do that, annotations can be any legal python syntax):

>>> def foo(bar: print('what?')):
...     pass
...
what?

You note that the type hint is executed immediately, so we see the print.

However, if I send myself in the future:

>>> from __future__ import annotations
>>>
>>> def foo(bar: print('what?')):
...     pass
...

Nothing happens. The print is only executed if I actually try to read the annotation:

>>> import typing
>>> typing.get_type_hints(foo)
what?
{'bar': <class 'NoneType'>}

At this stage you may be wondering why this is of any importance in your life and if you shouldn't be watching a primitive technology video instead.

But consider this:

from dataclasses import dataclass
@dataclass
class Chicken:
    child: Egg # <- NameError: name 'Egg' is not defined

@dataclass
class Egg:
    parent: Chicken

Although, to be honest, I wouldn't use type hints in production before 3.9 at least. 3.10 being even better, 3.11 being the absolute best. It's a feature that really gains from using the very latest versions, as you will soon see.

Python 3.8

The walrus

PEP 572's Assignment Expressions is another change nobody calls with its official name. Everybody knows it as "the walrus operator", because it's an ascii walrus emoji:

:=

You gotta squint and tilt your head though.

This allows for assigning content to a variable and testing it at the same time.

Even today it's rarely used, but when you do need it, it can be handy. E.G., it turns calculating safely the hash of a file from this:

import hashlib

hasher = hashlib.sha1()

with open('ubuntu-20.04.6-desktop-amd64.iso', 'rb') as f:
    block = f.read(1000000)
    while block:
        hasher.update(block)
        block = f.read(1000000)

print(hasher.hexdigest())

Into this:

import hashlib

hasher = hashlib.sha1()

with open('ubuntu-20.04.6-desktop-amd64.iso', 'rb') as f:
    while (block := f.read(1000000)):
        hasher.update(block)

print(hasher.hexdigest())

Yeah that's... a pretty small improvement. But it's nice.

Positional only arguments

Let's define a web-scale function ready to be sent to a map/reduce serverless pipeline:

>>> def add(a, b):
...     return a + b
...
>>> add(1, 2)
3
>>> add(a=1, b=2)
3

In Python, if you put parameters AFTER a star, it becomes keyword only:

>>> def add(a, *, b):
...     return a + b
...
>>> add(a=1, b=2)
3
>>> add(1, 2)
TypeError: add() takes 1 positional argument but 2 were given

This has been legal Python for 15 years, believe it or not.

3.8 just balances it out with /, which states anything BEFORE it is positional only:

>>> def add(a, /, b):
...     return a + b
...
>>> add(1, 2)
3
>>> add(a=1, b=2)
TypeError: add() got some positional-only arguments passed as keyword arguments: 'a'

f-string can debug

We all debug using print() sometimes.

Ok, often.

Ok, most of the times.

Well, they know that upstairs, and they have given you a shiny new format symbol:

=

Put that in your f-string, and suddenly, you can print an expression and its result in one go.

Before:

>>> a = 1
>>> print('a =', a)
a = 1
>>> print("a + 1 =", a + 1)
a + 1 = 2

Now:

>>> a = 1
>>> print(f'{a = }')
a = 1
>>> print(f"{a + 1 = }")
a + 1 = 2

Python 3.9

Merging dicts

You can add lists, but you can't add dicts.

You can merge them using unpacking:

>>> {**{"a": 1}, **{"b": 2}}
{'a': 1, 'b': 2}

But I have the feeling this is not the first thing that popped into your mind when thinking about the problem.

To solve this, Python 3.9 comes with a new meaning for |, creating the union of two dicts, like it did with sets:

>>> {"a": 1} | {"b": 2}
{'a': 1, 'b': 2}
>>> d = {"a": 1}
>>> d |= {"b": 2}
>>> d
{'a': 1, 'b': 2}

Type Hinting Generics In Standard Collections

Sorry, we don't have a cute name for PEP 585.

You can remember it as "the stuff that finally makes type hints bearable".

Before this feature, you had to do things like this to get proper typing:

from typing import List, Tuple

list_of_tuple_of_string: List[Tuple[str, str]] = []

Now you can just do this:

list_of_tuple_of_string: list[tuple[str, str]] = []

Python adopts a stable annual release cadence

PEP 602 is not complicated: we get a fresh new major version of Python every year. Not a feature per se, but I thought you would want to know.

String methods to remove prefixes and suffixes

Finally, PEP 616 is a PEP with a clear title. It pretty much does what it says: strings have two new methods to remove stuff before and after it.

People misunderstood strip(), and it introduced bugs. Indeed, strip() can take several characters and remove them from the left and right of a string, but the devil is in the details: it removes characters, not a sequence of characters.

If you pass it cat, it will not remove the string cat, but it will remove any a, b, or c:

>>> "tic tac".strip('cat')
'ic '

To avoid those mistakes, ".removeprefix()" and ".removesuffix()" have been introduced:

>>> "tic tac".removesuffix('cat')
'tic tac'
>>> "tic tac cat".removesuffix('cat')
'tic tac'

Stuff you shouldn't use

3.9 also introduce things you should avoid:

Flexible decorator syntax. You can now invoke Cthulhu by mistakenly start a dark ritual, since almost anything can be put after @. Don't. Use dumb decorators.
The standard library now has full support for time zones and include the IANA Time Zone DB; so you can fail at handling dates, but now internationally. Be Grug, use pendulum.
A very official bug ticket led to the addition of the HTTP status code 418 IM_A_TEAPOT. Ok, this one is funny.

Python 3.10

Python 3.10 has been my favorite release since 3.6, it's packed with a lot of cool stuff, and above all:

The best feature ever

The best feature ever added to a Python release in 10 years is not a new module, or a new function, or a new syntax.

It's better error messages!

In 3.10 (and the trend continues in 3.11 and 3.12), error messages improved considerably:

res = [event, count for event, count in stats.items()]
       ^^^^^^^^^^^
SyntaxError: did you forget parentheses around the comprehension target?

bool = {'Yes': True, 'No': False, 'FileNotFound': None
         ^
SyntaxError: '{' was never closed

bool = {'Yes': True, 'No': False 'FileNotFound': None}
                           ^^^^^^^^^^^^^^^^^^^^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

And many, many more.

Pattern matching

I have a love/hate relationship with match/case, the new keywords added in this release. On one hand, they make some checks clearer by allowing some nifty declarative comparisons. On the other hand, the syntax is so full of gotchas that one PEP was not enough to introduce it.

Seriously, they had to do three: PEP 634, 635 and 636.

So what is this about?

In short, it lets you compare something to various data structure shapes, and act on it:

>>> from dataclasses import dataclass
>>> @dataclass
... class Point:
...     x: int
...     y: int
...
>>> line = [Point(1, 2), Point(4, 3)]
>>> match line:
...     case [Point(x=0, y=0), _]:
...         print("From origin")
...     case [_, Point(x=0, y=0)]:
...         print("To origin")
...     case [pt1, pt2] if pt1 == pt2:
...         print("Err, that's a dot?")
...     case [pt1, pt2]:
...         print(pt1, pt2)
...
...
Point(x=1, y=2) Point(x=4, y=3)

I'm not going to dive into details about this feature here, because to be meaningful, it would require an entire dedicated article.

Type Unions

PEP 604 renders typing.Union and typing.Optional obsolete.

Before:

>>> from typing import Union, Optional
>>> data: Optional[Union[bytes, str]] = None

After:

>>> data: bytes | str | None = None

It's the little things.

zip() gets strict

zip() stops when the first iterable stops:

>>> list(zip("abcdefghijklm", (1, 2, 3)))
[('a', 1), ('b', 2), ('c', 3)]

It's often what you want, and sometimes, what you want is that it fills the missing values, for which we have itertools.zip_longest().

But what if you are in the situation when having iterables of different sizes is a bug, and zip() is hiding that from you?

Thanks to PEP 618, can now force a strict mode that will raise an exception in that case:

>>> list(zip("abcdefghijklm", (1, 2, 3), strict=True))
ValueError: zip() argument 2 is shorter than argument 1

Had to use it last week. Neat.

OpenSSL 1.1.1 is required

PEP 644 requires OpenSSL, and a certain version of it at that. That seems uninteresting until you realize it solves a lot of packaging issues. So by the sheer act of using Python 3.10, you increase your chances of less packaging pain.

I told you it was a good release.

Keyword arguments on dataclasses

You can now require parameters of data classes to be keyword-only (like with functions).

Either all attributes:

>>> from dataclasses import dataclass
>>> @dataclass(kw_only=True)
... class Point:
...     x: int
...     y: int
...
>>> Point(1, 2)
TypeError: Point.__init__() takes 1 positional argument but 3 were given

Or the ones after KW_ONLY:

from dataclasses import dataclass, KW_ONLY

>>> from dataclasses import dataclass, KW_ONLY
>>> @dataclass
... class Point:
...     x: int
...     y: int
...     _: KW_ONLY
...     z: float
>>> Point(1, 2, 3)
TypeError: Point.__init__() takes 3 positional arguments but 4 were given

The error message sucks though (thanks to self).

This fixes a big problem with data classes, as before this you couldn't inherit from a data class with default values to a field:

>>> @dataclass
... class Point:
...     x: int = 0
...     y: int = 0
...
>>> @dataclass
... class ColoredPoint(Point):
...         color: str
...
TypeError: non-default argument 'color' follows default argument

But if you use kw_only, it works:

>>> @dataclass(kw_only=True)
... class Point:
...     x: int = 0
...     y: int = 0
...
>>> @dataclass(kw_only=True)
... class ColoredPoint(Point):
...         color: str
...

Python 3.11

22% faster on average

Python 3.11 is the release known to be significantly faster than the previous one.

Nothing to add.

Exception groups and except*

This is weird, but you can now raise and catch several exceptions at the same time:

>>> raise ExceptionGroup("Isildur's Bane", [ValueError("H"), ValueError("D"), ValueError("E")])
  + Exception Group Traceback (most recent call last):
  |   File "<stdin>", line 1, in <module>
  | ExceptionGroup: Isildur's Bane (3 sub-exceptions)
  +-+---------------- 1 ----------------
    | ValueError: H
    +---------------- 2 ----------------
    | ValueError: D
    +---------------- 3 ----------------
    | ValueError: E
    +------------------------------------
>>> try:
...     raise ExceptionGroup("Isildur's Bane", [ValueError("H"), ValueError("D"), ValueError("E")])
... except* ValueError:
...     print('Capture exception group')
...
Capture exception group

The except * cannot be mixed with regular except and it will catch exception groups and exceptions alike, but always bind it to an ExceptionGroup.

That's not something you are going to use often, and is mostly useful for frameworks that deal with high concurrency.

I just include it so that you know this strange syntax has been added.

We can now read TOML with the stdlib

With pyproject.toml being the new star of the show to configure Python tooling, having something to read the TOML format in the stdlib was becoming more and more important.

Writing TOML though, is hard to get right. So the core devs decided to wait to implement it, and only implement loading in the new module: tomllib.

>>> import tomllib
>>> toml_str = """
... python-version = "3.11.0"
... python-implementation = "CPython"
... """
>>> data = tomllib.loads(toml_str)
>>> print(data["python-version"])
3.11.0

glob is better

It's a tiny, tiny, addition hidden in the depths of the changelog, but glob and rglob can filter on directories now, if you end the path with "/":

>>> glob.glob('/tmp/*.log')
['/tmp/psync_err.log']
>>> glob.glob('/tmp/*python*/')
['/tmp/python-languageserver-cancellation/', '/tmp/pythontemp/']

We skipped a lot

This article is super long, and yet we haven't scratched the surface of what all the releases brought on the table. But I think with just this, you are ready to code until 2025 without burning all your ChatGPT credits solely for asking about the new stuff.

Bite code!

Discussion about this post