Summary
Python shells are incredibly useful, but every time your start one, you tend to repeat the same things over and over.
PYTHONSTARTUP
can help with this: set it up to a Python script, and it will be executed automatically if, and only if, a Python shell starts. Anything that is imported or defined in the start up script will be available in the shell.
Python is as good as the shell is
Even with the best IDE in the world and a lot of experience under your belt, the Python shell is still a formidable tool for exploratory programming. Testing a new API, mangling some data, automating some boring task...
That's why projects like ipython and jupyter became so successful, and I still use them to this day.
However, every time you start a shell, the first thing you usually do is import a bunch of stuff, or frenetically press the top arrow key to recall something from your history. This is aggravated by the fact Python has very limited support for reloading changed modules in a shell, so restarting it is a common thing.
But there is a rarely known shell feature that can make all those REPL back and forth a more enjoyable experience: the PYTHONSTARTUP
environment variable.
This article is why I wrote "Environment variables for beginners" a few days ago: this trick should be available for everybody.
Running a script when a shell starts
PYTHONSTARTUP
can be set to a python module path. Here is what is in mine:
echo $PYTHONSTARTUP
/home/user/Scripts/pythonstartup.py
The file doesn't need to be called "pythonstartup.py", but I like to make it explicit.
This file is a regular Python module, it's nothing special. It doesn't need to be in a specific location, it can contain any legal Python code, and it will work even if it's not importable.
Now the nice thing is, when a Python shell starts, this file will be automatically executed.
This happens only when a Python shell starts, not when a regular Python program starts. None of your scripts will be affected. This is solely a tool for making your life easier in an interactive session.
What to put in your startup script?
To understand how useful this is, I must add one detail: the entire namespace of the script is made available in your shell.
This means anything you import, anything you set in a variable or any function you define in this startup script will now always be available in your shell at startup.
You don't have to have a single script. You can have one script for each project if you wish, and make your shell load all the things you need to work on that specific project.
Because this works in ipython and jupyter, I've seen a lot of people at least preloading their favorite data manipulation library like this:
from pandas import pd
from numpy import np
You start your shell, numpy and pandas are ready to go.
You can do that in a more generic manner by catching ImportError
:
try:
from pandas import pd
except ImportError:
pass
try:
from numpy import np
except ImportError:
pass
This way, if they are not installed, the script still works.
But even without going that far, you can imports things you use a lot from the stdlib, such as json
, datetime
classes, pathlib.Path
and so on...
You can even open a file and load the data or configuration from it, or create a connection pool.
django_extensions provides the shell_plus
command to load all the ORM models so you can query your DB in the current project, but you can do something similar with PYTHONSTARTUP
manually with any project.
A real life PYTHONSTARTUP script
import atexit | |
# First, a lot of imports. I don't use all of them all the time, | |
# but I like to have them available. | |
import csv | |
import datetime as dt | |
import hashlib | |
import json | |
import math | |
import os | |
import random | |
import re | |
import shelve | |
import subprocess | |
import sys | |
import tempfile | |
from collections import * | |
from functools import partial | |
from inspect import getmembers, ismethod, stack | |
from io import open | |
from itertools import * | |
from math import * | |
from pprint import pprint as pretty_print | |
from types import FunctionType | |
from uuid import uuid4 | |
from unittest.mock import patch, Mock, MagicMock | |
from datetime import datetime, date, timedelta | |
import pip | |
# Set ipython prompt to ">>> " for easier copying | |
try: | |
from IPython import get_ipython | |
get_ipython().run_line_magic("doctest_mode", "") | |
get_ipython().run_line_magic("load_ext", "ipython_autoimport") | |
except: | |
pass | |
try: | |
import asyncio | |
# for easier pasting | |
from typing import * | |
from dataclasses import dataclass, field | |
except ImportError: | |
pass | |
# Mostly to parse strings to dates | |
try: | |
import pendulum | |
except ImportError: | |
pass | |
# I think you know why | |
try: | |
import requests | |
except ImportError: | |
pass | |
# If I'm in a regular Python shell, at least activate tab completion | |
try: | |
import readline | |
readline.parse_and_bind("tab: complete") | |
except ImportError: | |
pass | |
try: | |
# if rich is installed, set the repr() to be pretty printted | |
from rich import pretty | |
pretty.install() | |
except ImportError: | |
pass | |
# I wish Python had a Path literal but I can get pretty close with this: | |
# Tiis let me to p/"path/to/file" to get a Path object | |
from pathlib import Path | |
try: | |
class PathLiteral: | |
def __truediv__(self, other): | |
try: | |
return Path(other.format(**stack()[1][0].f_globals)) | |
except KeyError as e: | |
raise NameError("name {e} is not defined".format(e=e)) | |
def __call__(self, string): | |
return self / string | |
p = PathLiteral() | |
except ImportError: | |
pass | |
# Force jupyter to print any lone variable, not just the last one in a cell | |
try: | |
from IPython.core.interactiveshell import InteractiveShell | |
InteractiveShell.ast_node_interactivity = "all" | |
except ImportError: | |
pass | |
# Check if I'm in a venv | |
VENV = os.environ.get("VIRTUAL_ENV") | |
# Make sure I always have a temp folder ready to go | |
TEMP_DIR = Path(tempfile.gettempdir()) / "pythontemp" | |
try: | |
os.makedirs(TEMP_DIR) | |
except Exception as e: | |
pass | |
# I'm lazy | |
def now(): | |
return datetime.now() | |
def today(): | |
return date.today() | |
# Since restarting a shell is common, I like to have a way to persit | |
# calculations between sessions. This is a simple way to do it. | |
# I can do store.foo = 'bar' and get store.foo in the next session. | |
class Store(object): | |
def __init__(self, filename): | |
object.__setattr__(self, "DICT", shelve.DbfilenameShelf(filename)) | |
# cleaning the dict on the way out | |
atexit.register(self._clean) | |
def __getattribute__(self, name): | |
if name not in ("DICT", "_clean"): | |
try: | |
return self.DICT[name] | |
except: | |
return None | |
return object.__getattribute__(self, name) | |
def __setattr__(self, name, value): | |
if name in ("DICT", "_clean"): | |
raise ValueError("'%s' is a reserved name for this store" % name) | |
self.DICT[name] = value | |
def _clean(self): | |
self.DICT.sync() | |
self.DICT.close() | |
python_version = "py%s" % sys.version_info.major | |
try: | |
store = Store(os.path.join(TEMP_DIR, "store.%s.db") % python_version) | |
except: | |
# This could be solved using diskcache but I never took the time | |
# to do it. | |
print( | |
"\n/!\ A session using this store already exist." | |
) | |
# Shorcurt to pip install packages without leaving the shell | |
def pip_install(*packages): | |
""" Install packages directly in the shell """ | |
for name in packages: | |
cmd = ["install", name] | |
if not hasattr(sys, "real_prefix"): | |
raise ValueError("Not in a virtualenv") | |
pip.main(cmd) | |
def is_public_attribute(obj, name, methods=()): | |
return not name.startswith("_") and name not in methods and hasattr(obj, name) | |
# if rich is not installed | |
def attributes(obj): | |
members = getmembers(type(obj)) | |
methods = {name for name, val in members if callable(val)} | |
is_allowed = partial(is_public_attribute, methods=methods) | |
return {name: getattr(obj, name) for name in dir(obj) if is_allowed(obj, name)} | |
STDLIB_COLLECTIONS = ( | |
str, | |
bytes, | |
int, | |
float, | |
complex, | |
memoryview, | |
dict, | |
tuple, | |
set, | |
bool, | |
bytearray, | |
frozenset, | |
slice, | |
deque, | |
defaultdict, | |
OrderedDict, | |
Counter, | |
) | |
try: | |
# rich a great pretty printer, but if it's not there, | |
# I have a decent fallback | |
from rich.pretty import print as pprint | |
except ImportError: | |
def pprint(obj): | |
if isinstance(obj, STDLIB_COLLECTIONS): | |
pretty_print(obj) | |
else: | |
try: | |
name = "class " + obj.__name__ | |
except AttributeError: | |
name = obj.__class__.__name__ + "()" | |
class_name = obj.__class__.__name__ | |
print(name + ":") | |
attrs = attributes(obj) | |
if not attrs: | |
print(" <No attributes>") | |
for name, val in attributes(obj).items(): | |
print(" ", name, "=", val) | |
# pp/obj is a shortcut to pprint(obj), it work as a postfix operator as | |
# well, which in the shell is handy | |
class Printer(float): | |
def __call__(self, *args, **kwargs): | |
pprint(*args, **kwargs) | |
def __truediv__(self, other): | |
pprint(other) | |
def __rtruediv__(self, other): | |
pprint(other) | |
def __repr__(self): | |
return repr(pprint) | |
pp = Printer() | |
pp.__doc__ = pprint.__doc__ | |
# Same as the printer, but for turning something into a list with l/obj | |
class ToList(list): | |
def __truediv__(self, other): | |
return list(other) | |
def __rtruediv__(self, other): | |
return list(other) | |
def __call__(self, *args, **kwargs): | |
return list(*args, **kwargs) | |
l = ToList() | |
# Those alias means JSON is now valid Python syntax that you can copy/paste | |
null = None | |
true = True | |
false = False | |
# faker is a great library to generate fake data, so I have a shortcut for it | |
# If I want 10 fake emails, I can do fake.email(10) | |
try: | |
import faker | |
except ImportError: | |
pass | |
else: | |
from faker.providers import internet, geo | |
def get_faker(locale="en"): | |
fake = faker.Faker(locale) | |
fake.add_provider(internet) | |
fake.add_provider(geo) | |
return fake | |
class Fake(object): | |
factory = get_faker() | |
@property | |
def fr(self): | |
self.factory = get_faker("fr_FR") | |
return self | |
@property | |
def en(self): | |
self.factory = get_faker() | |
return self | |
def __getattr__(self, name): | |
faker_provider = self.factory.__getattr__(name) | |
return lambda count=1: self.call_faker(faker_provider, count) | |
def __dir__(self): | |
attrs = [ | |
attr for factory in fake.factory._factories for attr in dir(factory) | |
] | |
return ["fr", "en", *attrs] | |
def call_faker(self, faker_provider, count=1): | |
if count == 1: | |
return faker_provider() | |
else: | |
return [faker_provider() for _ in range(count)] | |
fake = Fake() | |
Tips and trick
Be careful if you have a single startup script but several versions of Python. At least make sure to use a compatible syntax and try/except when it matters.
Make some speed test. Some libraries like scipy or pandas are quite heavy, it may slow down your shell startup time.
Use
del
if there is something you imported or defined but don't want to end up in the shell name space
Lately, I’ve developed a practice of creating an “interact.py” file near the code that I’m working on. I put imports, some magic values, and helpers there that I need while fiddling with things. Then, I just run `$python -i interact.py`. This is my local way of dealing with the same frustration, and I haven’t thought about more general approach yet. That’s interesting.
Hey this is pretty awesome, I didn't know this existed at all. I like your script as well, there are some useful utilities in there. I have two questions:
- On line 266 you say setting this aliases makes JSON NOT valid Python, but I get the feeling it is the other way around. Is that correct? If it works like I think it works then it is a smart trick!
- Do you think I can expect this to work everywhere, or would certain IDEs like Pycharm perhaps use their own startup scripts overwriting my own? Pycharm has a pretty nice built-in REPL for instance.