Python get files by date

How to use Boto3 library in Python to get a list of files from S3 based on the last modified date using AWS Resource?

Problem Statement − Use boto3 library in Python to get a list of files from S3, those are modified after a given date timestamp.

Example − List out test.zip from Bucket_1/testfolder of S3 if it is modified after 2021-01-21 13:19:56.986445+00:00.

Approach/Algorithm to solve this problem

Step 1 − Import boto3 and botocore exceptions to handle exceptions.

Step 2s3_path and last_modified_timestamp are the two parameters in function list_all_objects_based_on_last_modified. «last_modified_timestamp» should be in the format “2021-01-22 13:19:56.986445+00:00”. By default, boto3 understands the UTC timezone irrespective of geographical location.

Step 3 − Validate the s3_path is passed in AWS format as s3://bucket_name/key.

Step 4 − Create an AWS session using boto3 library.

Step 5 − Create an AWS resource for S3.

Step 6 − Now list out all the objects of the given prefix using the function list_objects and handle the exceptions, if any.

Step 7 − The result of the above function is a dictionary and it contains all the file-level information in a key named as ‘Contents’. Now extract the bucket-level details in an object.

Step 8 − Now, object is also a dictionary having all the details of a file. Now, fetch LastModified detail of each file and compare with the given date timestamp.

Step 9 − If LastModified is greater than the given timestamp, save the complete file name, else ignore it.

Step 10 − Return the list of files those are modified after the given date timestamp.

Example

The following code gets the list of files from AWS S3 based on the last modified date timestamp −

import boto3 from botocore.exceptions import ClientError def list_all_objects_based_on_last_modified(s3_files_path, last_modified_timestamp): if 's3://' not in s3_files_path: raise Exception('Given path is not a valid s3 path.') session = boto3.session.Session() s3_resource = session.resource('s3') bucket_token = s3_files_path.split('/') bucket = bucket_token[2] folder_path = bucket_token[3:] prefix = "" for path in folder_path: prefix = prefix + path + '/' try: result = s3_resource.meta.client.list_objects(Bucket=bucket, Prefix=prefix) except ClientError as e: raise Exception( "boto3 client error in list_all_objects_based_on_last_modified function: " + e.__str__()) except Exception as e: raise Exception( "Unexpected error in list_all_objects_based_on_last_modified function of s3 helper: " + e.__str__()) filtered_file_names = [] for obj in result['Contents']: if str(obj["LastModified"]) >= str(last_modified_timestamp): full_s3_file = "s3://" + bucket + "/" + obj["Key"] filtered_file_names.append(full_s3_file) return filtered_file_names #give a timestamp to fetch test.zip print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder" , "2021-01-21 13:19:56.986445+00:00")) #give a timestamp no file is modified after that print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder" , "2021-01-21 13:19:56.986445+00:00"))

Output

#give a timestamp to fetch test.zip [s3://Bucket_1/testfolder/test.zip] #give a timestamp no file is modified after that []

Источник

Python: Get list of files in directory sorted by date and time

In this article, we will discuss different ways to get list of all files in a directory / folder sorted by date and time in python.

Table of contents

Get list of files in directory sorted by date using glob()

In python, the glob module provides a function glob() to find files in a directory based on matching pattern. Similar to the unix path expansion rules, we can use wildcards and regular expression to match & find few or all files/sub-directories in a directory. We will use this to get a list of all files in a directory but sorted by the last modification time. Steps are as follows,

  1. Get a list of all files & directories in the given directory using glob().
  2. Using the filter() function and os.path.isfileIO(), select files only from the list.
  3. Sort the list of files based on last modification time using sorted() function.
    • For this, use os.path.getmtime() as the key argument in the sorted() function.

Complete example to get a list of all files in directory sorted by last modification datetime is as follows,

import glob import os import time dir_name = 'C:/Program Files/Java/jdk1.8.0_191/' # Get list of all files only in the given directory list_of_files = filter( os.path.isfile, glob.glob(dir_name + '*') ) # Sort list of files based on last modification time in ascending order list_of_files = sorted( list_of_files, key = os.path.getmtime) # Iterate over sorted list of files and print file path # along with last modification time of file for file_path in list_of_files: timestamp_str = time.strftime( '%m/%d/%Y :: %H:%M:%S', time.gmtime(os.path.getmtime(file_path))) print(timestamp_str, ' -->', file_path)

Frequently Asked:

10/06/2018 :: 04:34:06 --> C:/Program Files/Java/jdk1.8.0_191\COPYRIGHT 10/06/2018 :: 04:34:08 --> C:/Program Files/Java/jdk1.8.0_191\src.zip 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\LICENSE 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\README.html 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\THIRDPARTYLICENSEREADME-JAVAFX.txt 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\THIRDPARTYLICENSEREADME.txt 11/18/2018 :: 09:42:12 --> C:/Program Files/Java/jdk1.8.0_191\javafx-src.zip 11/18/2018 :: 09:42:19 --> C:/Program Files/Java/jdk1.8.0_191\release

In this solution we created a list of files in a folder, sorted by date. But the list contains the complete path of the files. What if we want only file names in sorted order by date and time?

Get list of files in directory sorted by date using os.listdir()

In Python, the os module provides a function listdir(dir_path), which returns a list of file & directory names in the given directory path. Using the filter() function and os.path.isfileIO(), select files only from the list. Then we can sort this list of file names based on the last modification time, using the os.path.getmtime() function as the key argument in the sorted() function.

Complete example to get list of files in directory sorted by last modification datetime is as follows,

import os import time dir_name = 'C:/Program Files/Java/jdk1.8.0_191/' # Get list of all files only in the given directory list_of_files = filter( lambda x: os.path.isfile(os.path.join(dir_name, x)), os.listdir(dir_name) ) # Sort list of files based on last modification time in ascending order list_of_files = sorted( list_of_files, key = lambda x: os.path.getmtime(os.path.join(dir_name, x)) ) # Iterate over sorted list of files and print file path # along with last modification time of file for file_name in list_of_files: file_path = os.path.join(dir_name, file_name) timestamp_str = time.strftime( '%m/%d/%Y :: %H:%M:%S', time.gmtime(os.path.getmtime(file_path))) print(timestamp_str, ' -->', file_name)
10/06/2018 :: 04:34:06 --> COPYRIGHT 10/06/2018 :: 04:34:08 --> src.zip 11/18/2018 :: 09:42:11 --> LICENSE 11/18/2018 :: 09:42:11 --> README.html 11/18/2018 :: 09:42:11 --> THIRDPARTYLICENSEREADME-JAVAFX.txt 11/18/2018 :: 09:42:11 --> THIRDPARTYLICENSEREADME.txt 11/18/2018 :: 09:42:12 --> javafx-src.zip 11/18/2018 :: 09:42:19 --> release

In this solution we created a list of file names in a folder sorted by date. The sorted() function uses the key argument as the comparator while sorting the items in given list. Therefore, by passing os.path.getmtime() as the key argument, we forced it to sort the files by last modification time,

Python: Get list of files in directory and sub-directories sorted by date

In both the previous examples we created a list of files in a directory sorted by date. But it covered the files in the given directory only, not in nested directories. So, if you want to get a list of files in directory and sub-directory sorted by date then checkout this example,

import glob import os import time dir_name = 'C:/Program Files/Java/jdk1.8.0_191/' # Get list of all files only in the given directory list_of_files = filter( os.path.isfile, glob.glob(dir_name + '/**/*') ) # Sort list of files based on last modification time in ascending order list_of_files = sorted( list_of_files, key = os.path.getmtime) # Iterate over sorted list of files and print file path # along with last modification date time for file_path in list_of_files: timestamp_str = time.strftime( '%m/%d/%Y :: %H:%M:%S', time.gmtime(os.path.getmtime(file_path))) print(timestamp_str, ' -->', file_path)
11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\bin\appletviewer.exe 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\bin\extcheck.exe 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\bin\idlj.exe 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\include\jdwpTransport.h 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\include\jni.h 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\include\jvmti.h 11/18/2018 :: 09:42:11 --> C:/Program Files/Java/jdk1.8.0_191\include\jvmticmlr.h 11/18/2018 :: 09:42:13 --> C:/Program Files/Java/jdk1.8.0_191\jre\COPYRIGHT 11/18/2018 :: 09:42:13 --> C:/Program Files/Java/jdk1.8.0_191\jre\Welcome.html 11/18/2018 :: 09:42:13 --> C:/Program Files/Java/jdk1.8.0_191\lib\ant-javafx.jar 11/18/2018 :: 09:42:14 --> C:/Program Files/Java/jdk1.8.0_191\lib\ct.sym 11/18/2018 :: 09:42:14 --> C:/Program Files/Java/jdk1.8.0_191\lib\dt.jar 11/18/2018 :: 09:42:14 --> C:/Program Files/Java/jdk1.8.0_191\lib\jvm.lib 11/18/2018 :: 09:42:17 --> C:/Program Files/Java/jdk1.8.0_191\lib\orb.idl 11/18/2018 :: 09:42:17 --> C:/Program Files/Java/jdk1.8.0_191\lib\packager.jar 11/18/2018 :: 09:42:17 --> C:/Program Files/Java/jdk1.8.0_191\lib\sa-jdi.jar 11/18/2018 :: 09:42:26 --> C:/Program Files/Java/jdk1.8.0_191\lib\tools.jar 11/18/2018 :: 09:42:26 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\plugin.jar 11/18/2018 :: 09:42:27 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\javaws.jar 11/18/2018 :: 09:42:27 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\deploy.jar 11/18/2018 :: 09:42:31 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\rt.jar 11/18/2018 :: 09:42:32 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\jsse.jar 11/18/2018 :: 09:42:32 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\charsets.jar 11/18/2018 :: 09:42:32 --> C:/Program Files/Java/jdk1.8.0_191\jre\lib\ext\localedata.jar 11/18/2018 :: 09:42:34 --> C:/Program Files/Java/jdk1.8.0_191\jre\bin\server\classes.jsa

We used the glob() function with pattern ‘/**/*’ and recursive=True argument. It gave a list of all files in the given directory and all sub-directories. Then using the os.path.getmtime() function as the key argument in sorted() function, we created a list of files sorted by date and time.

We learned about different ways to get a list of files in a folder, sorted by date & time.

Источник

Читайте также:  Free html templates lawyer
Оцените статью