Getting Every File in a Windows Directory
I have a folder in Windows 7 which contains multiple .txt files. How would one get every file in said directory as a list?
5 Answers 5
You can use os.listdir(«.») to list the contents of the current directory («.»):
for name in os.listdir("."): if name.endswith(".txt"): print(name)
If you want the whole list as a Python list, use a list comprehension:
a = [name for name in os.listdir(".") if name.endswith(".txt")]
import os import glob os.chdir('c:/mydir') files = glob.glob('*.txt')
It’s also a way to do it without the unnecessary unasked-for side effect of changing [one of] the current working director[y|ies].
All of the answers here don’t address the fact that if you pass glob.glob() a Windows path (for example, C:\okay\what\i_guess\ ), it does not run as expected. Instead, you need to use pathlib :
from pathlib import Path glob_path = Path(r"C:\okay\what\i_guess") file_list = [str(pp) for pp in glob_path.glob("**/*.txt")]
import fnmatch import os return [file for file in os.listdir('.') if fnmatch.fnmatch(file, '*.txt')]
If you just need the current directory, use os.listdir.
>>> os.listdir('.') # get the files/directories >>> [os.path.abspath(x) for x in os.listdir('.')] # gets the absolute paths >>> [x for x in os.listdir('.') if os.path.isfile(x)] # only files >>> [x for x in os.listdir('.') if x.endswith('.txt')] # files ending in .txt only
You can also use os.walk if you need to recursively get the contents of a directory. Refer to the python documentation for os.walk.
How to open every file in a folder
I have a python script parse.py, which in the script open a file, say file1, and then do something maybe print out the total number of characters.
filename = 'file1' f = open(filename, 'r') content = f.read() print filename, len(content)
However, I don’t want to do this file by file manually, is there a way to take care of every single file automatically? Like
ls | awk '' | python parse.py >> output
Then the problem is how could I read the file name from standardin? or there are already some built-in functions to do the ls and those kind of work easily? Thanks!
8 Answers 8
You can list all files in the current directory using os.listdir :
import os for filename in os.listdir(os.getcwd()): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff
Or you can list only some files, depending on the file pattern using the glob module:
import os, glob for filename in glob.glob('*.txt'): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff
It doesn’t have to be the current directory you can list them in any path you want:
import os, glob path = '/some/path/to/file' for filename in glob.glob(os.path.join(path, '*.txt')): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff
Or you can even use the pipe as you specified using fileinput
import fileinput for line in fileinput.input(): # do your stuff
And you can then use it with piping:
does this handle the file opening and closing automatically too? I’m surprised ur not using with . as . statements. Could you clarify?
Charlie, glob.glob and os.listdir return the filenames. You would then open those one by one within the loop.
You should try using os.walk .
import os yourpath = 'path' for root, dirs, files in os.walk(yourpath, topdown=False): for name in files: print(os.path.join(root, name)) stuff for name in dirs: print(os.path.join(root, name)) stuff
I was looking for this answer:
import os,glob folder_path = '/some/path/to/file' for filename in glob.glob(os.path.join(folder_path, '*.htm')): with open(filename, 'r') as f: text = f.read() print (filename) print (len(text))
you can choose as well ‘*.txt’ or other ends of your filename
You can actually just use os module to do both:
Here’s a simple example:
import os #os module imported here location = os.getcwd() # get present working directory location here counter = 0 #keep a count of all files found csvfiles = [] #list to store all csv files found at location filebeginwithhello = [] # list to keep all files that begin with 'hello' otherfiles = [] #list to keep any other file that do not match the criteria for file in os.listdir(location): try: if file.endswith(".csv"): print "csv file found:\t", file csvfiles.append(str(file)) counter = counter+1 elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file print "csv file found:\t", file csvfiles.append(str(file)) counter = counter+1 elif file.startswith("hello"): print "hello files found: \t", file filebeginwithhello.append(file) counter = counter+1 else: otherfiles.append(file) counter = counter+1 except Exception as e: raise e print "No files found here!" print "Total files found:\t", counter
Now you have not only listed all the files in a folder but also have them (optionally) sorted by starting name, file type and others. Just now iterate over each list and do your stuff.
How to get files in a directory, including all subdirectories
If you want to search in a different directory from «.» you could pass the direcotry as sys.argv[1] and call os.walk(sys.argv[1]).
If you want to exclude a certain directory, e.g., old_logs , you can simply remove it from dirnames and it won’t be searched: if «old_logs» in dirnames: dirnames.remove(«old_logs»)
Since Python 3 print is a function and must be called like this: print(os.path.join(dirpath, filename))
You can also use the glob module along with os.walk.
import os from glob import glob files = [] start_dir = os.getcwd() pattern = "*.log" for dir,_,_ in os.walk(start_dir): files.extend(glob(os.path.join(dir,pattern)))
@nueverest os.walk returns a 3-tuple (dirpath, dirnames, filenames) at each iteration, and we’re only interested in dirpath (assigned to dir above); the underscores are just used as placeholders for the other 2 values we’re not interested in (i.e. dirnames , and then filenames , are being assigned to the variable _ , which we will never use).
Why run glob and do extra I/O, when you already have the list of filenames which you could filter with fnmatch.filter ?
Checkout Python Recursive Directory Walker. In short os.listdir() and os.walk() are your friends.
A single line solution using only (nested) list comprehension:
import os path_list = [os.path.join(dirpath,filename) for dirpath, _, filenames in os.walk('.') for filename in filenames if filename.endswith('.log')]
This «one-liner» is excessive. If you’re going over 79 characters (see PEP 8), it takes away from readability and should either be split into multiple lines or made into a function (preferred).
That’s true, I posted this mostly for the simplicity & list comprehension. It’s indeed nice to split this over multiple lines.
import os for logfile in os.popen('find . -type f -name *.log').read().split('\n')[0:-1]: print logfile
import subprocess (out, err) = subprocess.Popen(["find", ".", "-type", "f", "-name", "*.log"], stdout=subprocess.PIPE).communicate() for logfile in out.split('\n')[0:-1]: print logfile
These two take the advantage of find . -type f -name *.log .
The first one is simpler but not guaranteed for white-space when add -name *.log , but worked fine for simply find ../testdata -type f (in my OS X environment).
The second one using subprocess seems more complicated, but this is the white-space safe one (again, in my OS X environment).
Listing of all files in directory?
Can anybody help me create a function which will create a list of all files under a certain directory by using pathlib library? Here, I have a: I have
- c:\desktop\test\A\A.txt
- c:\desktop\test\B\B_1\B.txt
- c:\desktop\test\123.txt
I expected to have a single list which would have the paths above, but my code returns a nested list.
from pathlib import Path def searching_all_files(directory: Path): file_list = [] # A list for storing files existing in directories for x in directory.iterdir(): if x.is_file(): file_list.append(x) else: file_list.append(searching_all_files(directory/x)) return file_list p = Path('C:\\Users\\akrio\\Desktop\\Test') print(searching_all_files(p))
Hope anybody could correct me.
13 Answers 13
Use Path.glob() to list all files and directories. And then filter it in a List Comprehensions.
p = Path(r'C:\Users\akrio\Desktop\Test').glob('**/*') files = [x for x in p if x.is_file()]
More from the pathlib module:
With pathlib, it is as simple as the below comand.
path = Path('C:\\Users\\akrio\\Desktop\\Test') list(path.iterdir())
Need only list of files (not dirs)? one liner: [f for f in Path(path_to_dir).iterdir() if f.is_file()]
Wrong. iterdir lists only the files in the directory but the OP has made it plain that he/she wants an explorer which will search down through the whole structure.
from pathlib import Path from pprint import pprint def searching_all_files(directory): dirpath = Path(directory) assert dirpath.is_dir() file_list = [] for x in dirpath.iterdir(): if x.is_file(): file_list.append(x) elif x.is_dir(): file_list.extend(searching_all_files(x)) return file_list pprint(searching_all_files('.'))
assert is a statement, not a function, so I think you want assert dirpath.is_dir() with no parenthesis. In Python 2 and 3. Or simply assert dirpath.exists()
If you can assume that only file objects have a . in the name (i.e., .txt, .png, etc.) you can do a glob or recursive glob search.
from pathlib import Path # Search the directory list(Path('testDir').glob('*.*')) # Search directories and subdirectories, recursively list(Path('testDir').rglob('*.*'))
But that’s not always the case. Sometimes there are hidden directories like .ipynb_checkpoints and files that do not have extensions. In that case, use list comprehension or a filter to sort out the Path objects that are files.
# Search Single Directory list(filter(lambda x: x.is_file(), Path('testDir').iterdir())) # Search Directories Recursively list(filter(lambda x: x.is_file(), Path('testDir').rglob('*')))
# Search Single Directory [x for x in Path('testDir').iterdir() if x.is_file()] # Search Directories Recursively [x for x in Path('testDir').rglob('*') if x.is_file()]