Why is os.sep insufficient for path operations?
I was reading the documentation for Python’s os module recently, and a sentence caught my eye:
os.sep
- The character used by the operating system to separate pathname components. This is
'/'
for POSIX and'\\\\'
for Windows. Note that knowing this is not sufficient to be able to parse or concatenate pathnames — useos.path.split()
andos.path.join()
— but it is occasionally useful.
I was naturally curious: when is os.sep
not sufficient?
I decided to have a peek at the implementation of os.path.split()
and os.path.join()
– how are they more complicated than a simple str.split()
and str.join()
?
First, it’s worth understanding how os.path
works.
Because paths behave differently on different platforms, there are multiple modules for manipulating paths in the standard library – posixpath
for UNIX-style paths, and ntpath
for Windows-style paths. Older versions of Python included modules like macpath
for old MacOS-style paths and os2emxpath
for OS/2 EMX paths.
When you use os.path
, it selects the appropriate module for your platform. Using os.path
means your code should do the right thing, even if the code is run on a different platform to where you wrote it. If you do want a specific style of path, you can import the specific module, which is what I’ll do in my examples.
I looked at four files to find out what these functions do:
- cpython/Lib/ntpath.py
- cpython/Lib/posixpath.py
- cpython/Lib/test/test_ntpath.py
- cpython/Lib/test/test_posixpath.py
Even without reading the code, the length of the functions and their tests give a clue as to how complicated they are.
Here are a few of my favourite examples where paths are trickier than simple strings:
-
The
os.path.split()
function can recognise that multiple path separators are equivalent to a single separator:>>> posixpath.split('/Users/alexwlchan///blog.txt') ('/Users/alexwlchan', 'blog.txt') >>> ntpath.split('c:\\\\Users\\alexwlchan\\\\\\blog.txt') ('c:\\\\Users\\alexwlchan', 'blog.txt')
-
The
os.path.join()
function only inserts a path separator as needed, and skips it if it already sees one:>>> posixpath.join('/x', 'y', 'z') '/x/y/z' >>> posixpath.join('/x', 'y/', 'z') '/x/y/z' >>> ntpath.join('c:\\x', 'y', 'z') 'c:\\x\\y\\z' >>> ntpath.join('c:\\x', 'y\\', 'z') 'c:\\x\\y\\z'
-
Both
os.path.split()
andos.path.join()
can cope with paths on Windows, which can use backslash and forward slash:>>> ntpath.join('/a/b', 'x/y') '/a/b\\x/y' >>> ntpath.join('/a/b/', 'x/y') '/a/b/x/y' >>> ntpath.split('/a/b\\x/y') ('/a/b\\x', 'y') >>> ntpath.split('/a/b/x/y') ('/a/b/x', 'y')
There’s also a lot of code for Windows drive letters and UNC paths, two concepts I’m totally unfamiliar with.
I don’t pretend to have a complete understanding of how paths work after this quick skim – but I did enjoy digging into it, and now I feel I have a better understanding of why I shouldn’t use os.sep
to do my own path operations.