Date Tags python

Often I find myself writing Python scripts to iterate over the files in a directory for fun and profit. This is so common that I wrote a simple helper function in Auxly called walkfiles(). The performance of walkfiles() always felt kinda sluggish, especially when compared to blazing fast utilities like fd. Only recently did I learn about os.scandir() in the Python standard library which is essentially a more performant os.listdir(). After some benchmarks, the speed improvement from os.scandir() is impressive. Here's the results using Python 3.7.2 on a decently spec'd Windows 10 laptop:

  • Recursively iterating though all 107,000 files under a directory using walkfiles() from Auxly 0.6.4 which was implemented using os.walk():
    • Five samples (seconds) = 9.96875, 9.796875, 9.421875, 9.453125, 9.8125
    • Average (seconds) = 9.690625
  • Same test using refactored walkfiles() implemented using os.scandir():
    • Five samples (seconds) = 2.359375, 1.90625, 1.859375, 1.71875, 1.734375
    • Average (seconds) = 1.915625
  • Using fd to iterate over the same 107,000 files:
    • Five samples (seconds) = 0.46875, 0.5, 0.46875, 0.515625, 0.484375
    • Average (seconds) = 0.4875
  • Using same file set with regex to iterate over only the 3,600 .txt files using walkfiles() from Auxly 0.6.4 (before refactor):
    • Five samples (seconds) = 3.5625, 3.0625, 3.109375, 3.109375, 3.078125
    • Average (seconds) = 3.184375
  • Iterating over the 3,600 .txt files with refactored walkfiles():
    • Five samples (seconds) = 2.109375, 1.953125, 1.78125, 1.796875, 1.8125
    • Average (seconds) = 1.890625
  • Using fd to iterate over the 3,600 .txt files:
    • Five samples (seconds) = 0.03125, 0.03125, 0.0625, 0.078125, 0.046875
    • Average (seconds) = 0.05

Wow, some big differences there. To summarize:

  • Iterating over 107,000 files:
    • Auxly 0.6.4 walkfiles() = 9.690625 seconds
    • Auxly refactored walkfiles() = 1.915625 seconds
    • fd = 0.4875 seconds
  • Iterating over 3,600 .txt files in the original 107,000:
    • Auxly 0.6.4 walkfiles() = 3.184375 seconds
    • Auxly refactored walkfiles() = 1.890625 seconds
    • fd = 0.05 seconds

The takeaways here are that os.scandir() is fast. Not fd fast but still not bad. This refactored walkfiles() will be included in an upcoming Auxly release.

Hi, I am Jeff Rimko!
A computer engineer and software developer in the greater Pittsburgh, Pennsylvania area.