Python walk all files in directory

Python 3: List the Contents of a Directory, Including Recursively

This article shows how to list the files and directories inside a directory using Python 3. Throughout this article, we’ll refer to the following example directory structure:

We’ll assume the code examples will be saved in script.py above, and will be run from inside the mydir directory so that the relative path ‘.’ always refers to mydir .

Using pathlib (Python 3.4 and up)

Non-Recursive

iterdir

To list the contents of a directory using Python 3.4 or higher, we can use the built-in pathlib library’s iterdir() to iterate through the contents. In our example directory, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p )

When we run from inside mydir , we should see output like:

Because iterdir is non-recursive, it only lists the immediate contents of mydir and not the contents of subdirectories (like a1.html ).

Читайте также:  Importing string module python

Note that each item returned by iterdir is also a pathlib.Path , so we can call any pathlib.Path method on the object. For example, to resolve each item as an absolute path, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p.resolve() )

This will list the resolved absolute path of each item instead of just the filenames.

Because iterdir returns a generator object (meant to be used in loops), if we want to store the results in a list variable, we can write:

from pathlib import Path files = list( Path( '.' ).iterdir() ) print( files )

glob

We can also use pathlib.Path.glob to list all files (the equivalent of iterdir ):

from pathlib import Path for p in Path( '.' ).glob( '*' ): print( p )

Filename Pattern Matching with glob

If we want to filter our results using Unix glob command-style pattern matching, glob can handle that too. For example, if we only want to list .html files, we would write in script.py :

from pathlib import Path for p in Path( '.' ).glob( '*.html' ): print( p )

As with iterdir , glob returns a generator object, so we’ll have to use list() if we want to convert it to a list:

from pathlib import Path files = list( Path( '.' ).glob( '*.html' ) ) print( files )

Recursive

To recursively list the entire directory tree rooted at a particular directory (including the contents of subdirectories), we can use rglob . In script.py , we can write:

from pathlib import Path for p in Path( '.' ).rglob( '*' ): print( p )

This time, when we run script.py from inside mydir , we should see output like:

rglob is the equivalent of calling glob with **/ at the beginning of the path, so the following code is equivalent to the rglob code we just saw:

from pathlib import Path for p in Path( '.' ).glob( '**/*' ): print( p )

Filename Pattern Matching with rglob

Just as with glob , rglob also allows glob-style pattern matching, but automatically does so recursively. In our example, to list all *.html files in the directory tree rooted at mydir , we can write in script.py :

from pathlib import Path for p in Path( '.' ).rglob( '*.html' ): print( p )

This should display all and only .html files, including those inside subdirectories:

Since rglob is the same as calling glob with **/ , we could also just use glob to achieve the same result:

from pathlib import Path for p in Path( '.' ).glob( '**/*.html' ): print( p )

Not Using pathlib

Non-Recursive

os.listdir

On any version of Python 3, we can use the built-in os library to list directory contents. In script.py , we can write:

import os for filename in os.listdir( '.' ): print( filename )

Unlike with pathlib , os.listdir simply returns filenames as strings, so we can’t call methods like .resolve() on the result items. To get full paths, we have to build them manually:

import os root = '.' for filename in os.listdir( root ): relative_path = os.path.join( root, filename ) absolute_path = os.path.abspath( relative_path ) print( absolute_path )

Another difference from pathlib is that os.listdir returns a list of strings, so we don’t need to call list() on the result to convert it to a list:

import os files = os.listdir( '.' ) # files is a list print( files )

glob

Also available on all versions of Python 3 is the built-in glob library, which provides Unix glob command-style filename pattern matching.

To list all items in a directory (equivalent to os.listdir ), we can write in script.py :

import glob for filename in glob.glob( './*' ): print( filename )

This will produce output like:

Note that the root directory ( ‘.’ in our example) is simply included in the path pattern passed into glob.glob() .

Filename Pattern Matching with glob

To list only .html files, we can write in script.py :

import glob for filename in glob.glob( './*.html' ): print( filename )

Recursive

Since Python versions lower than 3.5 do not have a recursive glob option, and Python versions 3.5 and up have pathlib.Path.rglob , we’ll skip recursive examples of glob.glob here.

os.walk

On any version of Python 3, we can use os.walk to list all the contents of a directory recursively.

os.walk() returns a generator object that can be used with a for loop. Each iteration yields a 3-tuple that represents a directory in the directory tree: — current_dir : the path of the directory that the current iteration represents; — subdirs : list of names (strings) of immediate subdirectories of current_dir ; and — files : list of names (strings) of files inside current_dir .

In our example, we can write in script.py :

import os for current_dir, subdirs, files in os.walk( '.' ): # Current Iteration Directory print( current_dir ) # Directories for dirname in subdirs: print( '\t' + dirname ) # Files for filename in files: print( '\t' + filename )

This produces the following output:

Источник

Python List Files in a Directory

In this article, we will see how to list all files of a directory in Python. There are multiple ways to list files of a directory. In this article, We will use the following four methods.

  • os.listdir(‘dir_path’) : Return the list of files and directories present in a specified directory path.
  • os.walk(‘dir_path’) : Recursively get the list all files in directory and subdirectories.
  • os.scandir(‘path’) : Returns directory entries along with file attribute information.
  • glob.glob(‘pattern’) : glob module to list files and folders whose names follow a specific pattern.

Table of contents

How to List All Files of a Directory

Getting a list of files of a directory is easy as pie! Use the listdir() and isfile() functions of an os module to list all files of a directory. Here are the steps.

  1. Import os module This module helps us to work with operating system-dependent functionality in Python. The os module provides functions for interacting with the operating system.
  2. Use os.listdir() function The os.listdir(‘path’) function returns a list containing the names of the files and directories present in the directory given by the path .
  3. Iterate the result Use for loop to Iterate the files returned by the listdir() function. Using for loop we will iterate each file returned by the listdir() function
  4. Use isfile() function In each loop iteration, use the os.path.isfile(‘path’) function to check whether the current path is a file or directory. If it is a file, then add it to a list. This function returns True if a given path is a file. Otherwise, it returns False.

Example to List Files of a Directory

Let’s see how to list files of an ‘account’ folder. The listdir() will list files only in the current directory and ignore the subdirectories.

Example 1: List only files from a directory

import os # folder path dir_path = r'E:\\account\\' # list to store files res = [] # Iterate directory for path in os.listdir(dir_path): # check if current path is a file if os.path.isfile(os.path.join(dir_path, path)): res.append(path) print(res)

Here we got three file names.

['profit.txt', 'sales.txt', 'sample.txt']

If you know generator expression, you can make code smaller and simplers using a generator function as shown below.

Generator Expression:

import os def get_files(path): for file in os.listdir(path): if os.path.isfile(os.path.join(path, file)): yield file

Then simply call it whenever required.

for file in get_files(r'E:\\account\\'): print(file)

Example 2: List both files and directories.

Directly call the listdir(‘path’) function to get the content of a directory.

import os # folder path dir_path = r'E:\\account\\' # list file and directories res = os.listdir(dir_path) print(res)

As you can see in the output, ‘reports_2021’ is a directory.

['profit.txt', 'reports_2021', 'sales.txt', 'sample.txt']

os.walk() to list all files in directory and subdirectories

The os.walk() function returns a generator that creates a tuple of values (current_path, directories in current_path, files in current_path).

Note: Using the os.walk() function we can list all directories, subdirectories, and files in a given directory.

It is a recursive function, i.e., every time the generator is called, it will follow each directory recursively to get a list of files and directories until no further sub-directories are available from the initial directory.

For example, calling the os.walk(‘path’) will yield two lists for each directory it visits. The first list contains files, and the second list includes directories.

Let’s see the example to list all files in directory and subdirectories.

from os import walk # folder path dir_path = r'E:\\account\\' # list to store files name res = [] for (dir_path, dir_names, file_names) in walk(dir_path): res.extend(file_names) print(res)
['profit.txt', 'sales.txt', 'sample.txt', 'december_2021.txt']

Note: Add break inside a loop to stop looking for files recursively inside subdirectories.

from os import walk # folder path dir_path = r'E:\\account\\' res = [] for (dir_path, dir_names, file_names) in walk(dir_path): res.extend(file_names) # don't look inside any subdirectory break print(res) 

os.scandir() to get files of a directory

The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

It returns an iterator of os.DirEntry objects, which contains file names.

import os # get all files inside a specific folder dir_path = r'E:\\account\\' for path in os.scandir(dir_path): if path.is_file(): print(path.name)
profit.txt sales.txt sample.txt

Glob Module to list Files of a Directory

The Python glob module, part of the Python Standard Library, is used to find the files and folders whose names follow a specific pattern.

For example, to get all files of a directory, we will use the dire_path/*.* pattern. Here, *.* means file with any extension.

Let’s see how to list files from a directory using a glob module.

import glob # search all files inside a specific folder # *.* means file name with any extension dir_path = r'E:\account\*.*' res = glob.glob(dir_path) print(res)
['E:\\account\\profit.txt', 'E:\\account\\sales.txt', 'E:\\account\\sample.txt']

Note: If you want to list files from subdirectories, then set the recursive attribute to True.

import glob # search all files inside a specific folder # *.* means file name with any extension dir_path = r'E:\demos\files_demos\account\**\*.*' for file in glob.glob(dir_path, recursive=True): print(file)
E:\account\profit.txt E:\account\sales.txt E:\account\sample.txt E:\account\reports_2021\december_2021.txt

Pathlib Module to list files of a directory

From Python 3.4 onwards, we can use the pathlib module, which provides a wrapper for most OS functions.

  • Import pathlib module: Pathlib module offers classes and methods to handle filesystem paths and get data related to files for different operating systems.
  • Next, Use the pathlib.Path(‘path’) to construct directory path
  • Next, Use the iterdir() to iterate all entries of a directory
  • In the end, check if a current entry is a file using the path.isfile() function
import pathlib # folder path dir_path = r'E:\\account\\' # to store file names res = [] # construct path object d = pathlib.Path(dir_path) # iterate directory for entry in d.iterdir(): # check if it a file if entry.is_file(): res.append(entry) print(res)

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

About Vishal

I’m Vishal Hule, Founder of PYnative.com. I am a Python developer, and I love to write articles to help students, developers, and learners. Follow me on Twitter

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ

Источник

Оцените статью