Boto3 python delete file

How to Delete Files in S3 Bucket Using Python

In this series of blogs, we are learning how to manage S3 buckets and files using Python. In this tutorial, we will learn how to delete files in S3 bucket using python.

Setting up permissions for S3

For this tutorial to work, we will need an IAM user who has access to upload a file to S3. We can configure this user on our local machine using AWS CLI or we can use its credentials directly in python script. We have already covered this topic on how to create an IAM user with S3 access. If you do not have this user setup please follow that blog first and then continue with this blog.

Delete one file from the S3 bucket

First, we will learn how we can delete a single file from the S3 bucket. Below is code that deletes single from the S3 bucket.

def delete_object_from_bucket(): bucket_name = "testbucket-frompython-2" file_name = "test9.txt" s3_client = boto3.client("s3") response = s3_client.delete_object(Bucket=bucket_name, Key=file_name) pprint(response)

Deleting multiple files from the S3 bucket

Sometimes we want to delete multiple files from the S3 bucket. Calling the above function multiple times is one option but boto3 has provided us with a better alternative. We can use the “delete_objects” function and pass a list of files to delete from the S3 bucket.

def delete_objects_from_bucket(): bucket_name = "testbucket-frompython-2" s3_client = boto3.client("s3") response = s3_client.delete_objects( Bucket=bucket_name, Delete=, ]>, ) pprint(response)

We can pass a list of Keys (file names) to this function and it will delete all those files in the single function call.

Читайте также:  Logging facade for java

Delete all files in a folder in the S3 bucket

Now we want to delete all files from one folder in the S3 bucket. we can have 1000’s files in a single S3 folder. Both of the above approaches will work but these are not efficient and cumbersome to use when we want to delete 1000s of files.

Unfortunately, there is no simple function that can delete all files in a folder in S3. But we can create a workaround using the “delete_objetcs” function for this.

def delete_all_objects_from_s3_folder(): """ This function deletes all files in a folder from S3 bucket :return: None """ bucket_name = "testbucket-frompython-2" s3_client = boto3.client("s3") # First we list all files in folder response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="images/") files_in_folder = response["Contents"] files_to_delete = [] # We will create Key array to pass to delete_objects function for f in files_in_folder: files_to_delete.append() # This will delete all files in a folder response = s3_client.delete_objects( Bucket=bucket_name, Delete= ) pprint(response)

As shown in the above code, first we list down all files in a folder and use that list to delete all of those files.

Conclusion

In this tutorial, we have learned how to delete files from S3 bucket. I hope you have found this useful. You can code from this tutorial at the GitHub Repo. In the next blog, we will learn how to delete S3 bucket. See you there 🙂

Источник

Amazon S3 boto — how to delete folder?

I created a folder in s3 named «test» and I pushed «test_1.jpg», «test_2.jpg» into «test». How can I use boto to delete folder «test»?

8 Answers 8

Here is 2018 (almost 2019) version:

s3 = boto3.resource('s3') bucket = s3.Bucket('mybucket') bucket.objects.filter(Prefix="myprefix/").delete() 

someone might find useful to know that bucket.objects.all().delete() empties the entire bucket without deleting it, no matter how many objects there are (i.e. it’s not affected but the 1000 items limit). See: boto3.amazonaws.com/v1/documentation/api/latest/reference/…

@raz If possible Could you update the answer to reflect/print what is being deleted, and to exit the program if the token expires? Right now it works well but there’s no way of knowing what is being deleted and what is happening as I only get a blinking cursor

There are no folders in S3. Instead, the keys form a flat namespace. However a key with slashes in its name shows specially in some programs, including the AWS console (see for example Amazon S3 boto — how to create a folder?).

Instead of deleting «a directory», you can (and have to) list files by prefix and delete. In essence:

for key in bucket.list(prefix='your/directory/'): key.delete() 

However the other accomplished answers on this page feature more efficient approaches.

Notice that the prefix is just searched using dummy string search. If the prefix were your/directory , that is, without the trailing slash appended, the program would also happily delete your/directory-that-you-wanted-to-remove-is-definitely-not-t‌​his-one .

How to delete the directory? If this directory will be deleted automatically when all files in this directory are deleted?

How to delete files in folder of s3 that are 2 days old in python. have this in my s3 — bucket/1/backups/(10 files) need to remove all files that are two days old

I feel that it’s been a while and boto3 has a few different ways of accomplishing this goal. This assumes you want to delete the test «folder» and all of its objects Here is one way:

s3 = boto3.resource('s3') objects_to_delete = s3.meta.client.list_objects(Bucket="MyBucket", Prefix="myfolder/test/") delete_keys = delete_keys['Objects'] = [ for k in [obj['Key'] for obj in objects_to_delete.get('Contents', [])]] s3.meta.client.delete_objects(Bucket="MyBucket", Delete=delete_keys) 

This should make two requests, one to fetch the objects in the folder, the second to delete all objects in said folder.

This is the fastest solution, but keep in mind that list_objects can’t return more than 1000 keys so you need to run this code multiple times.

This works great, and you can run it from a Python lambda by putting the code above in a lambda_handler function: import boto3; def lambda_handler(event, context): »’Code from above»’ . Make sure you give your Lambda permission to delete from S3 and extend the timeout.

A slight improvement on Patrick’s solution. As you might know, both list_objects() and delete_objects() have an object limit of 1000. This is why you have to paginate listing and delete in chunks. This is pretty universal and you can give Prefix to paginator.paginate() to delete subdirectories/paths

client = boto3.client('s3', **credentials) paginator = client.get_paginator('list_objects_v2') pages = paginator.paginate(Bucket=self.bucket_name) delete_us = dict(Objects=[]) for item in pages.search('Contents'): delete_us['Objects'].append(dict(Key=item['Key'])) # flush once aws limit reached if len(delete_us['Objects']) >= 1000: client.delete_objects(Bucket=bucket, Delete=delete_us) delete_us = dict(Objects=[]) # flush rest if len(delete_us['Objects']): client.delete_objects(Bucket=bucket, Delete=delete_us) 

And if you want to limit to a «directory» use the Prefix keyword in paginator.paginate() See all options: boto3.readthedocs.io/en/latest/reference/services/…

with the Prefix filter suggested by @Chad, I had to add a if item is not None check before deletion (since some of my S3 prefixes did not exist / had no objects)

@dmitraybelyakov, when i run the above code I am getting Typeerror: ‘NoneType’ object is not scbscriptable on the follwong line delete_us[‘Objects’].append(dict(key=item[‘Key’])) would you know any reason why it would do that

You can use bucket.delete_keys() with a list of keys (with a large number of keys I found this to be an order of magnitude faster than using key.delete).

delete_key_list = [] for key in bucket.list(prefix='/your/directory/'): delete_key_list.append(key) if len(delete_key_list) > 100: bucket.delete_keys(delete_key_list) delete_key_list = [] if len(delete_key_list) > 0: bucket.delete_keys(delete_key_list) 

If versioning is enabled on the S3 bucket:

s3 = boto3.resource('s3') bucket = s3.Bucket('mybucket') bucket.object_versions.filter(Prefix="myprefix/").delete() 

Is there a way to print someouput of what is being deleted? I want to delete the versions first and then the current one. ex bucket.objects.filter(Prefix=»myprefix/»).delete() ; right now I only see a blinking cursor and I don’t know what is happening.

You would have to do something like files_to_delete = bucket.object_versions.filter(Prefix=»myprefix/») then iterate over files_to_delete and call print() then delete() on them.

If one needs to filter by object contents like I did, the following is a blueprint for your logic:

def get_s3_objects_batches(s3: S3Client, **base_kwargs): kwargs = dict(MaxKeys=1000, **base_kwargs) while True: response = s3.list_objects_v2(**kwargs) # to yield each and every file: yield from response.get('Contents', []) yield response.get('Contents', []) if not response.get('IsTruncated'): # At the end of the list? break continuation_token = response.get('NextContinuationToken') kwargs['ContinuationToken'] = continuation_token def your_filter(b): raise NotImplementedError() session = boto3.session.Session(profile_name=profile_name) s3client = session.client('s3') for batch in get_s3_objects_batches(s3client, Bucket=bucket_name, Prefix=prefix): to_delete = [ for obj in batch if your_filter(obj)] if to_delete: s3client.delete_objects(Bucket=bucket_name, Delete=) 

Источник

Fastest way to delete files in Amazon S3

This sends one REST API call per file. If you have a large number of files, this can take a long time. Is there a faster way to do this?

3 Answers 3

The easiest way to delete files is by using Amazon S3 Lifecycle Rules. Simply specify the prefix and an age (eg 1 day after creation) and S3 will delete the files for you!

However, this is not necessarily the fastest way to delete them — it might take 24 hours until the rule is executed.

If you really want to delete the objects yourself, use delete_objects() instead of delete_object() . It can accept up to 1000 keys per call, which will be faster than deleting each object individually.

Boto provides support for MultiDelete. Here’s an example of how you would use it:

import boto.s3 conn = boto.s3.connect_to_region('us-east-1') # or whatever region you want bucket = conn.get_bucket('mybucket') keys_to_delete = ['mykey1', 'mykey2', 'mykey3', 'mykey4'] result = bucket.delete_keys(keys_to_delete) 

The AWS console now has an option to select a s3 bucket and click the «empty» button. This deletes files 1000 at a time (probably using the delete_objects() api call behind the scene) without the need to script it or call the api yourself. The only caveat is that you can’t navigate away from the page until the process completes or it will halt the process. Works well if console is an option and the bucket in question has less than 2 million objects. I’ve noticed it tends to hang after the 2 million deleted objects mark.

Источник

Оцените статью