Simple Enough Blog logo
  • Home 
  • Blog 
    • Technologies and Services 
    • Learning Guide 
    • Solutions and Tutorials 
  • Projects 
  • Tags 

  •  Language
    • English
    • Français
  1.   Blogs
  1. Home
  2. Blogs
  3. How to Filter S3 Events by Object Size?

How to Filter S3 Events by Object Size?

Posted on March 8, 2025 • 5 min read • 928 words
Aws   Beginner   Helene  
Aws   Beginner   Helene  
Share via
Simple Enough Blog
Link copied to clipboard

Storage management or cost optimization, we explain how to filter S3 events by object size.

On this page
I. When Do You Need to Filter S3 Events by Object Size?   1. Cost Optimization and Storage Management   2. Process Automation   3. Security and Compliance   4. Data Flow Optimization   5. Monitoring and Reporting   II. How to Filter S3 Events by Object Size? 3 Methods   1. Filtering via AWS EventBridge and AWS Lambda   Steps:   Example Lambda Code (Python):   2. Using AWS S3 Inventory with Athena   Example SQL Query with Athena:   3. Using Amazon S3 Batch Operations   Example Use Case: Moving Large Files to Glacier   1️⃣ Generate a List of Existing Files   2️⃣ Filter Files by Size   3️⃣ Create an S3 Batch Operations Task   4️⃣ Launch the Task and Monitor Execution   III. Comparison of AWS S3 Methods   Conclusion  
How to Filter S3 Events by Object Size?
Photo by Helene Hemmerter

I. When Do You Need to Filter S3 Events by Object Size?  

Filtering S3 events by object size is useful in various scenarios, especially when triggered actions depend on file volume. Here are some concrete use cases where this filtering is relevant:

1. Cost Optimization and Storage Management  

  • Automatically deleting oversized files: Prevents the accumulation of unnecessary files that take up space and generate extra costs.
  • Automatically moving files to lower-cost storage (e.g., Glacier) when they exceed a certain size.

2. Process Automation  

  • Executing a specific process on large files (e.g., video/audio conversion, compression, splitting large files into smaller chunks).
  • Triggering a Lambda function to index only relevant files based on their size.

3. Security and Compliance  

  • Avoiding the storage of excessively large files in public buckets, which could lead to abuse (e.g., uploading large archives by malicious users).
  • Triggering an alert or blocking uploads if a user exceeds a defined size limit.

4. Data Flow Optimization  

  • Directing small files to real-time processing (Lambda, Kinesis) and larger files to batch processing to optimize system efficiency.
  • Redirecting files based on their size to different buckets (e.g., files < 1MB to a high-speed access bucket, files > 100MB to cold storage).

5. Monitoring and Reporting  

  • Generating reports on storage usage by filtering objects exceeding a critical size.
  • Detecting anomalies, such as sudden uploads of very large files, which could indicate an issue (e.g., massive log errors, data leaks, etc.).

II. How to Filter S3 Events by Object Size? 3 Methods  

1. Filtering via AWS EventBridge and AWS Lambda  

AWS S3 sends events to EventBridge or an AWS Lambda function. Unfortunately, EventBridge rules do not directly allow filtering by object size. However, you can configure an AWS Lambda function to process events and apply filtering based on size.

Steps:  

  1. Create an EventBridge rule that captures S3 events (s3:ObjectCreated:*) and forwards the data to an AWS Lambda function.
  2. Extract the object size from event['Records'][0]['s3']['object']['size'] within the Lambda function.
  3. Apply a conditional filter to process only files exceeding a certain size.

Example Lambda Code (Python):  

import json

def lambda_handler(event, context):
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        object_size = record['s3']['object']['size']  # Size in bytes

        if object_size > 10_000_000:  # Example: 10 MB
            print(f"File {object_key} in {bucket_name} exceeds 10MB. Size: {object_size} bytes")
            # Perform a specific action (e.g., store info, send an alert, etc.)

    return {
        'statusCode': 200,
        'body': json.dumps('Filtering completed')
    }

You can then configure a destination to store these filtered events.


2. Using AWS S3 Inventory with Athena  

If you do not need real-time filtering but instead prefer periodic analysis:

  1. Enable S3 Inventory (a daily or weekly report of objects in a bucket).
  2. Use AWS Athena to query these CSV/Parquet files and filter by size.

Example SQL Query with Athena:  

SELECT key, size
FROM s3_inventory_table
WHERE size > 10000000;  -- Filters objects larger than 10MB

3. Using Amazon S3 Batch Operations  

If the goal is to automatically delete or move oversized files, you can:

  1. Enable S3 Inventory.
  2. Create an S3 Batch Operations job that applies size-based filtering before executing an action (e.g., moving files to another bucket or deleting them).

Example Use Case: Moving Large Files to Glacier  

Imagine you want to move all files larger than 100MB to a Glacier bucket.

1️⃣ Generate a List of Existing Files  

  • Enable S3 Inventory to generate a CSV file containing all objects and their sizes.

2️⃣ Filter Files by Size  

  • Download the Inventory CSV file and filter only objects larger than 100MB.

3️⃣ Create an S3 Batch Operations Task  

In the AWS S3 → Batch Operations console:

  • Select Create a Task.
  • Provide the filtered CSV file.
  • Choose the Copy Objects action.
  • Configure the destination: a bucket with a lifecycle policy to send files to Glacier.

4️⃣ Launch the Task and Monitor Execution  

  • S3 Batch Operations automatically processes all listed objects.
  • AWS provides a detailed report on processed objects.

III. Comparison of AWS S3 Methods  

Criteria AWS Lambda + EventBridge S3 Inventory + Athena S3 Batch Operations
Reactivity Real-time Delayed (periodic, based on inventory frequency) Semi-automatic (requires manual task creation)
Complexity Medium (requires a Lambda script) Low (simple SQL query) Medium (requires an Inventory file or object list)
Cost Can be high (if triggered frequently) Low (storage + Athena query costs) Medium (charged per executed action)
Scalability High (continuous event handling) Very high (can analyze millions of objects) High (processes millions of objects)
Primary Use Case Triggering immediate actions on specific objects Analyzing and reporting on a large number of objects Mass actions (copying, deleting, modifying metadata)
Advanced Filtering Yes (via Python code) Yes (via advanced SQL queries) Limited (based on the provided list)
Ease of Setup Relatively simple (requires EventBridge + Lambda setup) Easy (requires enabling Inventory + writing SQL) Medium (simple configuration but requires an Inventory file)

Conclusion  

Filtering S3 events by object size is essential for cost optimization, automating large file processing, enhancing security, and improving monitoring. Depending on the need, different solutions exist: AWS Lambda with EventBridge, S3 Inventory with Athena, or S3 Batch Operations. Each approach offers a level of flexibility and performance suited to different use cases.

If your goal is real-time reaction, using AWS Lambda with EventBridge is recommended. If you prefer a more analytical and periodic approach, Athena with S3 Inventory is a solid solution. Finally, for automated actions on existing objects, S3 Batch Operations enables bulk file processing.

By applying these methods, you can better manage your AWS resources and optimize your S3 infrastructure.

 The 5 Must-Read Whitepapers
What is a Bundle? Understanding the Concept 
  • I. When Do You Need to Filter S3 Events by Object Size?  
  • II. How to Filter S3 Events by Object Size? 3 Methods  
  • III. Comparison of AWS S3 Methods  
  • Conclusion  
Follow us

We work with you!

   
Copyright © 2025 Simple Enough Blog All rights reserved. | Powered by Hinode.
Simple Enough Blog
Code copied to clipboard