Python foreach files in directory

How to iterate over files in directory python

This tutorial will show you some ways to iterate files in a given directory and do some actions on them using Python.

1. Using os.listdir() #

This method returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

Example: print out all paths to files that have jpg or png extension in C:\Users\admin directory

import os directory = r'C:\Users\admin' for filename in os.listdir(directory): if filename.endswith(".jpg") or filename.endswith(".png"): print(os.path.join(directory, filename)) else: continue 

2. Using os.scandir() #

Since Python 3.5, things are much easier with os.scandir() . This example does the same thing as above but it uses os.scandir() instead of os.listdir()

import os directory = r'C:\Users\admin' for entry in os.scandir(directory): if (entry.path.endswith(".jpg") or entry.path.endswith(".png")) and entry.is_file(): print(entry.path) 

Both os.listdir() and os.scandir approaches only list the directories or files immediately under a directory. If you want recursive listing files and folders in a given directory, please consider using below methods.

Читайте также:  Функции matlab в python

3. Using os.walk() #

This method will iterate over all descendant files in subdirectories. Consider the example above, but in this case, this method recursively prints all images in C:\Users\admin directory.

import os for subdir, dirs, files in os.walk(r'C:\Users\admin'): for filename in files: filepath = subdir + os.sep + filename if filepath.endswith(".jpg") or filepath.endswith(".png"): print (filepath) 

4. Using glob module #

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

Let consider an example where we will list all png and pdf files in C:\Users\admin directory

import glob # Print png images in folder C:\Users\admin\ for filepath in glob.iglob(r'C:\Users\admin\*.png'): print(filepath) # Print pdf files in folder C:\Users\admin\ for filepath in glob.iglob(r'C:\Users\admin\*.pdf'): print(filepath) 

By default, glob.iglob only lists files immediately under the given directory. To recursively list all files in nested folders, set the recursive param to True

import glob # Recursively print png images in folder C:\Users\admin\ for filepath in glob.iglob(r'C:\Users\admin\*.png', recursive=True): print(filepath) # Recursively print pdf files in folder C:\Users\admin\ for filepath in glob.iglob(r'C:\Users\admin\*.pdf', recursive=True): print(filepath) 

You can either use glob.iglob or glob.glob . The difference is, glob.iglob return an iterator which yields the paths matching a pathname pattern while glob.glob returns a list.

5. Iterate recursively using Path class from pathlib module #

The code below does the same as above example, which lists and prints the png image in a folder but it uses the pathlib.Path

from pathlib import Path paths = Path('C:\Users\admin').glob('**/*.png') for path in paths: # because path is object not string path_in_str = str(path) # Do thing with the path print(path_in_str) 

Источник

How to loop through each file in directory in Python

Python is a computer programming language that is easy to learn and use. It is one of the most popular programming languages out there. In this digital age, where everyone is looking for ways to automate their business, Python is on the rise.

One of the many things that Python developers have to do over and over again is looping through files in a directory.

In this article, we will show you a few different way to for-loop each file in a directory, both with and without importing additional package.

Iterate over files in a given directory using os.path

The os module provides an unified interface to many common operating system features in different platforms.

Depending on the platform in which the program is running, the os module can automatically loads the right implementation (whether it’s posix or nt ) and perform the proper system call.

os is bundled as part of Python standard package, so using it will reduce your dependencies a lot. Below is a small code snippet where we iterate through files in a given directory and print their names using os.listdir()

import os for filename in os.listdir('/home/linuxpip'): if filename.endswith(".py"): print(filename) else: continueCode language: PHP (php)

Alternatively, you can use os.walk() to loop through files in a directory. Just remember that os.walk() function returns 3-tuple, which includes dirpath, dirnames, and filenames.

import os if __name__ == "__main__": for (root, dirs, files) in os.walk('/home/linuxpip', topdown=True): print("The files are: ") print(files)Code language: PHP (php)

Iterate over files in a given directory using pathlib

pathlib (Python 3.4+) is the newer way to interact with the filesystem in an unified way.

Why pathlib when you already had os , you may ask. The problem is os treats paths as strings, so you can’t get any further details about a specific path unless writing a few more lines of code. Besides that, the os module doesn’t allow you to find paths that match a given pattern inside a hierarchy natively. Plus, pathlib offers so many more streamlined approach to managing and interacting with filesystem paths across different operating systems.

In order to loop over files in a given directory, you can simply use Path.iterdir() .

from pathlib import Path my_dir = Path("/home/linuxpip") for path in dir.iterdir(): print(path)Code language: JavaScript (javascript)

On older Python version, even Python 2, you can still use pathlib by installing it with pip.

pathlib vs os.path

Pathlib has more intuitive syntax, whereas I feel os.path to be old and clunky at certain times. Pathlib object can perform filesystem operations on its own by calling its internal methods, while you need to call a bunch of different os.path functions to do the same thing. Pathlib allows you to easily traverse the paths using parent function. Meanwhile, os.path has to rely on directory names and path strings. On top of that, Pathlib allows you to iterate on directories and perform pattern matching natively when os.path does not. Finally, every pathlib’s Path object has multiple useful methods and attributes that can be used to perform filesystem operations or get attributes, which you have to use additional libraries such as glob or shutil in companion with os.path .

For example, joining paths in os.path must be

os.path.join(os.getcwd(), "processed_data", "output.xlsx")Code language: Python (python)

With Pathlib, you can simply use the / operator to join paths. A breakthrough that really level up the code readability.

os.path.join(os.getcwd(), "processed_data", "output.xlsx")Code language: CSS (css)

We hope that the article helped you learned how to loop through files in a directory efficiently. We’ve also written a few other guides for fixing common Python errors, such as Timeout in Python requests, Python Unresolved Import in VSCode or “IndexError: List Index Out of Range” in Python. If you have any suggestion, please feel free to leave a comment below.

Источник

How to iterate over files in directory using Python with example code

When working with files and folders, you may want to iterate over all files in a directory. In this tutorial, we will show you how to do this using Python. You will also learn how to iterate over files in directory with extension such as CSV or JSON.

Method 1: Using os.walk() #

This method is simple and works well for small directories. However, it is not very efficient for large directories. This is because it iterates over all files in the directory and subdirectories. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

Here is an example of how to use os.walk() to iterate over all files in a directory:

import os for root, dirs, files in os.walk('/path/to/your-directory'): for file in files: print(os.path.join(root, file)) 

If you want to filter fo only files with certain extension, you can use the following code:

import os for root, dirs, files in os.walk('/path/to/your-directory'): for file in files: if file.endswith('.csv'): print(os.path.join(root, file)) 

The code above will print all files with .csv extension in the directory. You may change the extension to your liking.

Method 2: Using glob() #

The glob module from Python provide 2 functions: glob.glob() and glob.iglob() . Basically, they are the same. The main difference is that glob.iglob() is an iterator, meaning that not all values get stored in memory — so can be much more efficient.

Let check an example of using glob.iglob() to loop through files in a directory:

import glob for file in glob.iglob('/path/to/your-directory/**/*.csv'): print(file) 

The output may look like this:

/path/to/your-directory/file1.csv /path/to/your-directory/file2.csv /path/to/your-directory/file3.csv 

The above code doesn’t go to subfolders. If you want to go into subfolders, you should set recursive=True in glob.iglob() . The following code will recursively go into subfolders and print all files with .csv extension:

import glob for file in glob.iglob('/path/to/your-directory/**/*.csv', recursive=True): print(file) 

Method 3: Using os.listdir() #

This method returns a list of files and subdirectories in a directory. It is similar to glob.glob() but it doesn’t support recursive=True . Since it returns both files and subdirectories, we have to check if the item is a file. Let check an example of using os.listdir() to loop through files in a directory:

import os my_dir = '/path/to/your-directory' for item in os.listdir(my_dir): if os.path.isfile(os.path.join(my_dir, item)): print(item) 

To filter out a specific file extension, you have to check the extension of the file just like the example code for os.walk() . Let see an example of using os.listdir() to loop through jpg files in a directory:

import os my_dir = '/path/to/your-directory' for item in os.listdir(my_dir): if os.path.isfile(os.path.join(my_dir, item)) and item.endswith('.jpg'): print(item) 

Method 4: Using pathlib.Path() #

The glob(pattern) function from pathlib.Path yield all the files in the given directory that match the given pattern. Patterns are the same as for fnmatch , with the addition of “**” which means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing.

Example iterate through files in directory using Python pathlib:

import pathlib my_dir = '/path/to/your-directory' for item in pathlib.Path(my_dir).glob('**/*.csv'): print(item) 

The above code will print all files with .csv extension in the directory recursively.

Method 5: Using os.scandir() #

This method return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information

The following example shows a simple use of scandir() to display all the files (excluding directories) in the given path:

import os my_dir = '/path/to/your-directory' for item in os.scandir(my_dir): if item.is_file(): print(item) 

Conclusion #

If you want to recursive list files in a folder and also in subfolders, you can use os.walk() or glob.iglob() or pathlib.Path().glob() . If you want to list files in a folder but not in subfolders, you can use os.listdir() or os.scandir() . Note that os.scandir() is more efficient than os.listdir() and glob.iglob() is more efficient than pathlib.Path().glob() .

glob.iglob() and pathlib.Path().glob() also let you filter files with certain extension without checking for the file name ending.

Источник

Оцените статью