Return total number of files in directory and subdirectories
@GWarner There are multiple sets of files (from each subdirectory) that are yielded by os.walk. You must sum over the length of each set to get the amount of files. If you use len(files) then you get a list where each element is the number of files in its associated subdirectory.
note you need to use forward slashes (or \\) instead of back slashes as you have here, otherwise python thinks you’re using escapes.
total = 0 for root, dirs, files in os.walk(folder): total += len(files)
Just add an elif statement that takes care of the directories:
def fileCount(folder): "count the number of files in a directory" count = 0 for filename in os.listdir(folder): path = os.path.join(folder, filename) if os.path.isfile(path): count += 1 elif os.path.isfolder(path): count += fileCount(path) return count
- Here are some one-liners using pathlib, which is part of the standard library.
- Use Path.cwd().rglob(‘*’) or Path(‘some path’).rglob(‘*’) , which creates a generator of all the files.
- Unpack the generator with list or * , and use len to get the number of files.
from pathlib import Path total_dir_files = len(list(Path.cwd().rglob('*'))) # or total_dir_files = len([*Path.cwd().rglob('*')]) # or filter for only files using is_file() file_count = len([f for f in Path.cwd().rglob('*') if f.is_file()])
Here is a time-test for the 3 most popular methods:
import os from datetime import datetime dir_path = "D:\\Photos" # os.listdir def recursive_call(dir_path): folder_array = os.listdir(dir_path) files = 0 folders = 0 for path in folder_array: if os.path.isfile(os.path.join(dir_path, path)): files += 1 elif os.path.isdir(os.path.join(dir_path, path)): folders += 1 file_count, folder_count = recursive_call(os.path.join(dir_path, path)) files += file_count folders += folder_count return files, folders start_time = datetime.now() files, folders = recursive_call(dir_path) print ("\nFolders: %d, Files: %d" % (folders, files)) print ("Time Taken (os.listdir): %s seconds" % (datetime.now() - start_time).total_seconds()) # os.walk start_time = datetime.now() file_array = [len(files) for r, d, files in os.walk(dir_path)] files = sum(file_array) folders = len(file_array) print ("\nFolders: %d, Files: %d" % (folders, files)) print ("Time Taken (os.walk): %s seconds" % (datetime.now() - start_time).total_seconds()) # os.scandir def recursive_call(dir_path): folder_array = os.scandir(dir_path) files = 0 folders = 0 for path in folder_array: if path.is_file(): files += 1 elif path.is_dir(): folders += 1 file_count, folder_count = recursive_call(path) files += file_count folders += folder_count return files, folders start_time = datetime.now() files, folders = recursive_call(dir_path) print ("\nFolders: %d, Files: %d" % (folders, files)) print ("Time Taken (os.scandir): %s seconds" % (datetime.now() - start_time).total_seconds())
Folders: 53, Files: 29048 Time Taken (os.listdir): 3.074945 seconds Folders: 53, Files: 29048 Time Taken (os.walk): 0.062022 seconds Folders: 53, Files: 29048 Time Taken (os.scandir): 0.048984 seconds
While os.walk is the most elegant, os.scandir recursively implemented seems to be the fastest.
How do I read the number of files in a folder using Python?
To count files and directories non-recursively you can use os.listdir and take its length.
To count files and directories recursively you can use os.walk to iterate over the files and subdirectories in the directory.
If you only want to count files not directories you can use os.listdir and os.path.file to check if each entry is a file:
import os.path path = '.' num_files = len([f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))])
Or alternatively using a generator:
num_files = sum(os.path.isfile(os.path.join(path, f)) for f in os.listdir(path))
Or you can use os.walk as follows:
I found some of these ideas from this thread.
pathlib , that is new in v. 3.4, makes like easier. The line labelled 1 makes a non-recursive list of the current folder, the one labelled 2 the recursive list.
from pathlib import Path import os os.chdir('c:/utilities') print (len(list(Path('.').glob('*')))) ## 1 print (len(list(Path('.').glob('**/*')))) ## 2
There are more goodies too. With these additional lines you can see both the absolute and relative file names for those items that are files.
for item in Path('.').glob('*'): if item.is_file(): print (str(item), str(item.absolute()))
boxee.py c:\utilities\boxee.py boxee_user_catalog.sqlite c:\utilities\boxee_user_catalog.sqlite find RSS.py c:\utilities\find RSS.py MyVideos34.sqlite c:\utilities\MyVideos34.sqlite newsletter-1 c:\utilities\newsletter-1 notes.txt c:\utilities\notes.txt README c:\utilities\README saveHighlighted.ahk c:\utilities\saveHighlighted.ahk saveHighlighted.ahk.bak c:\utilities\saveHighlighted.ahk.bak temp.htm c:\utilities\temp.htm to_csv.py c:\utilities\to_csv.py
>>> import glob >>> print len(glob.glob('/tmp/*')) 10
Or, as Mark Byers suggests in his answer, if you only want files:
>>> print [f for f in glob.glob('/tmp/*') if os.path.isfile(f)] ['/tmp/foo'] >>> print sum(os.path.isfile(f) for f in glob.glob('/tmp/*')) 1
It should be said, that os.listdir(‘.’) includes hidden files (starting with a single dot), whereas glob(‘./*’) does not.
@lunaryorn — If you want hidden files in the current directory, use glob(‘.*’) . If you want everything including hidden files, use glob(‘.*’) + glob(‘*’) .
Mark Byer’s answer is simple, elegant, and goes along with the python spirit.
There’s a problem, however: if you try to run that for any other directory than «.» , it will fail, since os.listdir() returns the names of the files, not the full path. Those two are the same when listing the current working directory, so the error goes undetected in the source above.
For example, if you are at /home/me and you list /tmp , you’ll get (say) [‘flashXVA67’] . You’ll be testing /home/me/flashXVA67 instead of /tmp/flashXVA67 with the method above.
You can fix this using os.path.join() , like this:
import os.path path = './whatever' count = len([f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))])
Also, if you’re going to be doing this count a lot and require performance, you may want to do it without generating additional lists. Here’s a less elegant, unpythonesque yet efficient solution:
import os def fcount(path): """ Counts the number of files in a directory """ count = 0 for f in os.listdir(path): if os.path.isfile(os.path.join(path, f)): count += 1 return count # The following line prints the number of files in the current directory: path = "./whatever" print fcount(path)