There is no such thing as a raw text file

Jun 5, 2023

Also, there is no "binary language"

2 Comments

Jun 5, 2023

I imagine many older IT folk are squirming a bit. In the old days, the operating systems actually provided distinct interfaces for dealing with a few different kinds of files. If you tried to open a file with the wrong interface, you application would fail, often with no opportunity to recover. Unix systems introduces a great simplification where all i/o devices were presented as a file (or a file-like) object in the file tree. You used a common simple open or close, read or write, seek or tell, and control or status API. Life became much simpler for developers, in some ways, and more complex in others. We had to design, and hopefully standardize, the file formats mentioned in the article to store audio, video, graphics, etc.

The language run-time libraries provide the ability to use such an entity as a line-oriented text file, or a record oriented FORTRAN file or a COBOL block oriented file or just a plain stream of bytes. The *nix OS directly provides the plain stream of bytes, with all the other formats built on top of that. Probably the most common is the line-oriented text file. You ask the run time to give you lines of bytes which are separated by new-line characters, and you asked the run-time to write lines of bytes separated by newline chanters. Life was good.

Then in the 1980's along came the IBM PC and the Apple Macintosh. The PCDOS OS used a carriage return and a newline, and MACOS OS used just a carriage return, to separate lines in "text" files. By this time, the internet (but not yet as we know it now) linked thousands of computers together, and facilitated the transfer of data (often via files). Inter-op was a big buzz word and selling feature - it was very important for computers to talk to each other. And that meant that the new microcomputers had to talk to the rest of the world, generally be exchanging files. That is when the distinction between binary and text files became critical. Our apps had to know which of three conventions was used to separate lines in the text files.

Modern versions of Python have a feature called universal newlines. When you open a file as a text file, e.g. with mode='T', the Python run-time library will discover the new line convention used in the file and properly separate the byte stream into lines for you. And when you write to a text file, the run-time will use the usual line separator character(s) for the OS between the lines you write. When you open a text file in binary mode, you may need to perform this discovery yourself, and check which OS you are running on to determine the default line separator.

Under the hood, so to speak, the Python run-time is accessing a binary stream of bytes. It provides convenient services that depend on the file mode to make life easier for all Python developers. All we need to know is how we want to use the file. And hopefully that matches to content of the file.

Expand full comment

Reply (1)

Bite Code!

Jun 6, 2023

That's probably the best comment I ever had on this blog.

Expand full comment

Bite code!

There is no such thing as a raw text file