Python as a configuration language

Significant white spaces worked for YAML, OK?

Jan 23, 2024

Summary

Conf is not a solved problem. While TOML, YAML and JSON are popular choices in the wild, I like to use Python for:

Defining the conf data, like Django does. Except I would recommend not to make it importable.
Defining the conf schema, with something like Pydantic.

There are good reasons not to do so, though. In that case, I favor TOML for small files and JSON(5) for large ones. Pydantic can validate those as well.

If forced to use YAML, or if you need to share a conf or produce it dynamically, CUElang is a nice tool that will make your life easier. Especially if you can't use pydantic at all.

Everything sucks. But differently.

If there is one proof that IT is still in infancy, it's the fact nobody can't agree on solutions to common problems, even the ones we have every day, such as configuration.

Do you put things in a text file or in a DB ? If it's text, is it imperative code, or declarative markup? If it's markup, do you go XML, TOML, YAML, CSV, INI, JSON, JSON5, etc. ?

Over the years, I've seen some terrible things. Custom ini parsers to make it type rich. Conf in XML so complex you want to pull your eyes out. And of course, the now-popular monstrosity that is templating YAML files.

What if I used my own language for this?

At some point, everybody wondered about that. Hell, there is a whole derivative of Python dedicated to configuration.

The first popular approach that I can remember is Django's settings.py file.

It was, and still is, a blessing and a curse.

A blessing because finally, you could use your entire toolbox to manipulate the file. Good IDE support, debuggers, linters... The syntax is rich and expressive, you've got nice data types, and on top of that, you can pinch in with a little logic to help with DRY. Also, believe it or not, actually getting exceptions from your config errors is nice.

But a curse because:

Suddenly people had to know how the import system worked. I remembered frenetically googling DJANGO_SETTINGS_MODULE many times before I got it. By the way, if you are not sure you are super comfy with Python imports, we have a great article on that.
Side effects, and many kind of bugs with it, came with the package. Great power, great responsibilities, and all that.
You have Python! But then, you have Python. With implicit string concatenation, sneaky tuple comas, and in 2.7, unicode decode errors.

Despite that, I adopted the practice. I'm a big fan of Python as configuration for my own projects, even if:

It requires discipline for devs to keep the conf declarative. Discipline is not automatically enforceable, so it's prone to failure.
There is no guarantee of reproducibility. You may run the module twice, and it could produce 2 different results.
You need a Python VM. Works for a Django project, if nobody else will parse your config file.
If you are a SAAS provider, accepting Python as input is really hard to secure.

However, it makes configuration simple to setup. Just a file you declare constants in, you import, and boom, you have something with a clean and powerful syntax to express whatever your fancy. No need for special tools, a dedicated parser, type conversion. Plus, you are familiar with the error reporting.

Mix that up with a few env vars, and you are in business.

When not to

Using Python for configuration is not always possible nor desirable. If you expose the configuration to the outside world, you likely want something dumber. If you share the configs with other systems, you need something anybody can parse. Also, if the file is to be placed in a standard OS folder for config, do you really want to make that importable?

There are tons of good reason not to use Python for your configuration.

In general, I tend to choose:

Python for my own projects when I know only devs will touch it, and the configuration is not shared.
TOML, if non-human devs will need to touch the config, and it's a small one.
JSON, if non-human devs will need to touch the config, and it's a big one. Or if the conf needs to be shared. JSON5 if perfs are not an issue so I can haz comments. Otherwise regular JSON.
YAML if the system I use, like the CI, requires me to. Begrudgingly.

If you have to use any of those, though, I would recommend strongly to invest in a pydantic schema to define and validate the content of the configuration. Yes, even if the conf is in Python in the first place. I will make an article on that one day, but it serves both as a safety net and as documentation.

You see it solves one of the biggest problem of Django's settings: you don't know what to put in it, and when you get it wrong, it breaks the site with weird errors.

Another thing is, if you have to use Python, don't add the parent folder to sys.path to make it importable. Use it like a regular text config file: get the file path, load the content, in this case using exec(). This solves another common problem with python confs, which is that it requires the user to know too much about the import system.

But I digress, this is the section where I was supposed to talk about not using Python.

So if you are not using Python, for the conf, maybe you also cannot use pydantic to validate it.

In which case, you should define a schema in a language that several languages can understand, or use an external tool.

I lean toward the latter, and would recommend CUELang for the job. It's a DSL (I know, I know) specialized in configuration, with the following properties:

Not Turing complete yet sufficiently expressive to DRY.
Reproducible outputs and the runtime is safe and sandboxed.
Can define both the schema and data with the same language, in a separate, or same file. With union types.
Generate YAML or JSON. Can validate itself, or a YAML/JSON file.

The biggest drawback being the only implementation is currently in Golang, meaning you may have to subprocess or ffi.

Basically, it has the benefits of a Jinja template, without the drawbacks. You will produce YAML or JSON eventually, for the systems that need to consume the result, but it will be clean input, and clean output. I will make an article on that one day too.

I realize maybe one day I should write a full series of article about configuration to demonstrate those points with example and code. But for now, this short article will have to suffice.

Spencer Finkel

Mar 14, 2024

I recently implemented a config using pydantic-settings and found it quite extensible. For instance, I built a custom config source for SSM parameters that caches results and provides fall backs and specification for environment. Thinking about how we've done that in the past for Java apps is a whole messy thing.

Expand full comment

Brozozowski

Jan 24, 2024

Why do you consider YAML to be so much worse than JSON? The article you linked didn't seem to explain more than that.

3 replies by Bite Code! and others

3 more comments...

Bite code!

Discussion about this post