Python - How traverse filesystem directory

Python - How traverse filesystem directory

Every so often you will find yourself needing to write code that traverse a directory. They tend to be one-off scripts or clean up scripts that run in cron in my experience. Anyway, Python provides a very useful methods of walking a directory structure. We cover best of them.

Testing directory structure

Here is my testing filesystem tree. Root is in /test

~] tree -a /test
/test
├── A
│   ├── AA
│   │   └── aa.png
│   ├── a.png
│   └── a.txt
├── B
│   ├── BB
│   └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt

python os.walk()

os.walk()
os.walk(top, topdown=True, onerror=None, followlinks=False)
  • Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames)
  • dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name). Whether or not the lists are sorted depends on the file system. If a file is removed from or added to the dirpath directory during generating the lists, whether a name for that file be included is unspecified.
  • If optional argument topdown is True or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
  • By default, walk() will not walk down into symbolic links that resolve to directories. Set followlinks to True to visit directories pointed to by symlinks, on systems that support them.

os.walk() example 1

~] tree -a /test
/test
├── A
│   ├── AA
│   │   └── aa.png
│   ├── a.png
│   └── a.txt
├── B
│   ├── BB
│   └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
import os

for root, subfolders, filenames in os.walk("/test"):
    print(root, subfolders, filenames)

# output:
/test ['A', 'B', 'symlink'] ['.test', 'test.png', 'test.txt', 'broken_symlink']
/test/A ['AA'] ['a.txt', 'a.png']
/test/A/AA [] ['aa.png']
/test/B ['BB'] ['b.txt']
/test/B/BB [] []

os.walk() example 2

~] tree -a /test
/test
├── A
│   ├── AA
│   │   └── aa.png
│   ├── a.png
│   └── a.txt
├── B
│   ├── BB
│   └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
import os

for root, subfolders, filenames in os.walk("/test"):
    for file in filenames:
        print(os.path.join(root, file))

# output
/test/.test
/test/test.png
/test/test.txt
/test/broken_symlink
/test/A/a.txt
/test/A/a.png
/test/A/AA/aa.png
/test/B/b.txt