Process all files in directory python

Содержание

Getting Every File in a Windows Directory
5 Answers 5
How to open every file in a folder
8 Answers 8
Here’s a simple example:
How to get files in a directory, including all subdirectories
Listing of all files in directory?
13 Answers 13
More from the pathlib module:

Getting Every File in a Windows Directory

I have a folder in Windows 7 which contains multiple .txt files. How would one get every file in said directory as a list?

5 Answers 5

You can use os.listdir(«.») to list the contents of the current directory («.»):

for name in os.listdir("."): if name.endswith(".txt"): print(name)

If you want the whole list as a Python list, use a list comprehension:

a = [name for name in os.listdir(".") if name.endswith(".txt")]

import os import glob os.chdir('c:/mydir') files = glob.glob('*.txt')

It’s also a way to do it without the unnecessary unasked-for side effect of changing [one of] the current working director[y|ies].

All of the answers here don’t address the fact that if you pass glob.glob() a Windows path (for example, C:\okay\what\i_guess\ ), it does not run as expected. Instead, you need to use pathlib :

from pathlib import Path glob_path = Path(r"C:\okay\what\i_guess") file_list = [str(pp) for pp in glob_path.glob("**/*.txt")]

import fnmatch import os return [file for file in os.listdir('.') if fnmatch.fnmatch(file, '*.txt')]

If you just need the current directory, use os.listdir.

>>> os.listdir('.') # get the files/directories >>> [os.path.abspath(x) for x in os.listdir('.')] # gets the absolute paths >>> [x for x in os.listdir('.') if os.path.isfile(x)] # only files >>> [x for x in os.listdir('.') if x.endswith('.txt')] # files ending in .txt only

You can also use os.walk if you need to recursively get the contents of a directory. Refer to the python documentation for os.walk.

Источник

How to open every file in a folder

I have a python script parse.py, which in the script open a file, say file1, and then do something maybe print out the total number of characters.

filename = 'file1' f = open(filename, 'r') content = f.read() print filename, len(content)

However, I don’t want to do this file by file manually, is there a way to take care of every single file automatically? Like

ls | awk '' | python parse.py >> output

Then the problem is how could I read the file name from standardin? or there are already some built-in functions to do the ls and those kind of work easily? Thanks!

8 Answers 8

You can list all files in the current directory using os.listdir :

import os for filename in os.listdir(os.getcwd()): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff

Or you can list only some files, depending on the file pattern using the glob module:

import os, glob for filename in glob.glob('*.txt'): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff

It doesn’t have to be the current directory you can list them in any path you want:

import os, glob path = '/some/path/to/file' for filename in glob.glob(os.path.join(path, '*.txt')): with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode # do your stuff

Or you can even use the pipe as you specified using fileinput

import fileinput for line in fileinput.input(): # do your stuff

And you can then use it with piping:

does this handle the file opening and closing automatically too? I’m surprised ur not using with . as . statements. Could you clarify?

Charlie, glob.glob and os.listdir return the filenames. You would then open those one by one within the loop.

You should try using os.walk .

import os yourpath = 'path' for root, dirs, files in os.walk(yourpath, topdown=False): for name in files: print(os.path.join(root, name)) stuff for name in dirs: print(os.path.join(root, name)) stuff

I was looking for this answer:

import os,glob folder_path = '/some/path/to/file' for filename in glob.glob(os.path.join(folder_path, '*.htm')): with open(filename, 'r') as f: text = f.read() print (filename) print (len(text))

you can choose as well ‘*.txt’ or other ends of your filename

You can actually just use os module to do both:

Here’s a simple example:

import os #os module imported here location = os.getcwd() # get present working directory location here counter = 0 #keep a count of all files found csvfiles = [] #list to store all csv files found at location filebeginwithhello = [] # list to keep all files that begin with 'hello' otherfiles = [] #list to keep any other file that do not match the criteria for file in os.listdir(location): try: if file.endswith(".csv"): print "csv file found:\t", file csvfiles.append(str(file)) counter = counter+1 elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file print "csv file found:\t", file csvfiles.append(str(file)) counter = counter+1 elif file.startswith("hello"): print "hello files found: \t", file filebeginwithhello.append(file) counter = counter+1 else: otherfiles.append(file) counter = counter+1 except Exception as e: raise e print "No files found here!" print "Total files found:\t", counter

Now you have not only listed all the files in a folder but also have them (optionally) sorted by starting name, file type and others. Just now iterate over each list and do your stuff.

Источник

How to get files in a directory, including all subdirectories

If you want to search in a different directory from «.» you could pass the direcotry as sys.argv[1] and call os.walk(sys.argv[1]).

If you want to exclude a certain directory, e.g., old_logs , you can simply remove it from dirnames and it won’t be searched: if «old_logs» in dirnames: dirnames.remove(«old_logs»)

Since Python 3 print is a function and must be called like this: print(os.path.join(dirpath, filename))

You can also use the glob module along with os.walk.

import os from glob import glob files = [] start_dir = os.getcwd() pattern = "*.log" for dir,_,_ in os.walk(start_dir): files.extend(glob(os.path.join(dir,pattern)))

@nueverest os.walk returns a 3-tuple (dirpath, dirnames, filenames) at each iteration, and we’re only interested in dirpath (assigned to dir above); the underscores are just used as placeholders for the other 2 values we’re not interested in (i.e. dirnames , and then filenames , are being assigned to the variable _ , which we will never use).

Why run glob and do extra I/O, when you already have the list of filenames which you could filter with fnmatch.filter ?

Checkout Python Recursive Directory Walker. In short os.listdir() and os.walk() are your friends.

A single line solution using only (nested) list comprehension:

import os path_list = [os.path.join(dirpath,filename) for dirpath, _, filenames in os.walk('.') for filename in filenames if filename.endswith('.log')]

This «one-liner» is excessive. If you’re going over 79 characters (see PEP 8), it takes away from readability and should either be split into multiple lines or made into a function (preferred).

That’s true, I posted this mostly for the simplicity & list comprehension. It’s indeed nice to split this over multiple lines.

import os for logfile in os.popen('find . -type f -name *.log').read().split('\n')[0:-1]: print logfile

import subprocess (out, err) = subprocess.Popen(["find", ".", "-type", "f", "-name", "*.log"], stdout=subprocess.PIPE).communicate() for logfile in out.split('\n')[0:-1]: print logfile

These two take the advantage of find . -type f -name *.log .

The first one is simpler but not guaranteed for white-space when add -name *.log , but worked fine for simply find ../testdata -type f (in my OS X environment).

The second one using subprocess seems more complicated, but this is the white-space safe one (again, in my OS X environment).

Источник

Listing of all files in directory?

Can anybody help me create a function which will create a list of all files under a certain directory by using pathlib library? Here, I have a: I have

c:\desktop\test\A\A.txt
c:\desktop\test\B\B_1\B.txt
c:\desktop\test\123.txt

I expected to have a single list which would have the paths above, but my code returns a nested list.

from pathlib import Path def searching_all_files(directory: Path): file_list = [] # A list for storing files existing in directories for x in directory.iterdir(): if x.is_file(): file_list.append(x) else: file_list.append(searching_all_files(directory/x)) return file_list p = Path('C:\\Users\\akrio\\Desktop\\Test') print(searching_all_files(p))

Hope anybody could correct me.

13 Answers 13

Use Path.glob() to list all files and directories. And then filter it in a List Comprehensions.

p = Path(r'C:\Users\akrio\Desktop\Test').glob('**/*') files = [x for x in p if x.is_file()]

More from the pathlib module:

With pathlib, it is as simple as the below comand.

path = Path('C:\\Users\\akrio\\Desktop\\Test') list(path.iterdir())

Need only list of files (not dirs)? one liner: [f for f in Path(path_to_dir).iterdir() if f.is_file()]

Wrong. iterdir lists only the files in the directory but the OP has made it plain that he/she wants an explorer which will search down through the whole structure.

from pathlib import Path from pprint import pprint def searching_all_files(directory): dirpath = Path(directory) assert dirpath.is_dir() file_list = [] for x in dirpath.iterdir(): if x.is_file(): file_list.append(x) elif x.is_dir(): file_list.extend(searching_all_files(x)) return file_list pprint(searching_all_files('.'))

assert is a statement, not a function, so I think you want assert dirpath.is_dir() with no parenthesis. In Python 2 and 3. Or simply assert dirpath.exists()

If you can assume that only file objects have a . in the name (i.e., .txt, .png, etc.) you can do a glob or recursive glob search.

from pathlib import Path # Search the directory list(Path('testDir').glob('*.*')) # Search directories and subdirectories, recursively list(Path('testDir').rglob('*.*'))

But that’s not always the case. Sometimes there are hidden directories like .ipynb_checkpoints and files that do not have extensions. In that case, use list comprehension or a filter to sort out the Path objects that are files.

# Search Single Directory list(filter(lambda x: x.is_file(), Path('testDir').iterdir())) # Search Directories Recursively list(filter(lambda x: x.is_file(), Path('testDir').rglob('*')))

# Search Single Directory [x for x in Path('testDir').iterdir() if x.is_file()] # Search Directories Recursively [x for x in Path('testDir').rglob('*') if x.is_file()]

Источник