How to Filter S3 Events by Object Size?

Posted on March 8, 2025 • 5 min read • 928 words

Aws Beginner Helene

Share via

Link copied to clipboard

Storage management or cost optimization, we explain how to filter S3 events by object size.

On this page

I. When Do You Need to Filter S3 Events by Object Size?

Filtering S3 events by object size is useful in various scenarios, especially when triggered actions depend on file volume. Here are some concrete use cases where this filtering is relevant:

1. Cost Optimization and Storage Management

Automatically deleting oversized files: Prevents the accumulation of unnecessary files that take up space and generate extra costs.
Automatically moving files to lower-cost storage (e.g., Glacier) when they exceed a certain size.

2. Process Automation

Executing a specific process on large files (e.g., video/audio conversion, compression, splitting large files into smaller chunks).
Triggering a Lambda function to index only relevant files based on their size.

3. Security and Compliance

Avoiding the storage of excessively large files in public buckets, which could lead to abuse (e.g., uploading large archives by malicious users).
Triggering an alert or blocking uploads if a user exceeds a defined size limit.

4. Data Flow Optimization

Directing small files to real-time processing (Lambda, Kinesis) and larger files to batch processing to optimize system efficiency.
Redirecting files based on their size to different buckets (e.g., files < 1MB to a high-speed access bucket, files > 100MB to cold storage).

5. Monitoring and Reporting

Generating reports on storage usage by filtering objects exceeding a critical size.
Detecting anomalies, such as sudden uploads of very large files, which could indicate an issue (e.g., massive log errors, data leaks, etc.).

II. How to Filter S3 Events by Object Size? 3 Methods

1. Filtering via AWS EventBridge and AWS Lambda

AWS S3 sends events to EventBridge or an AWS Lambda function. Unfortunately, EventBridge rules do not directly allow filtering by object size. However, you can configure an AWS Lambda function to process events and apply filtering based on size.

Steps:

Create an EventBridge rule that captures S3 events (s3:ObjectCreated:*) and forwards the data to an AWS Lambda function.
Extract the object size from event['Records'][0]['s3']['object']['size'] within the Lambda function.
Apply a conditional filter to process only files exceeding a certain size.

Example Lambda Code (Python):

import json

def lambda_handler(event, context):
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        object_key = record['s3']['object']['key']
        object_size = record['s3']['object']['size']  # Size in bytes

        if object_size > 10_000_000:  # Example: 10 MB
            print(f"File {object_key} in {bucket_name} exceeds 10MB. Size: {object_size} bytes")
            # Perform a specific action (e.g., store info, send an alert, etc.)

    return {
        'statusCode': 200,
        'body': json.dumps('Filtering completed')
    }

You can then configure a destination to store these filtered events.

2. Using AWS S3 Inventory with Athena

If you do not need real-time filtering but instead prefer periodic analysis:

Enable S3 Inventory (a daily or weekly report of objects in a bucket).
Use AWS Athena to query these CSV/Parquet files and filter by size.

Example SQL Query with Athena:

SELECT key, size
FROM s3_inventory_table
WHERE size > 10000000;  -- Filters objects larger than 10MB

3. Using Amazon S3 Batch Operations

If the goal is to automatically delete or move oversized files, you can:

Enable S3 Inventory.
Create an S3 Batch Operations job that applies size-based filtering before executing an action (e.g., moving files to another bucket or deleting them).

Example Use Case: Moving Large Files to Glacier

Imagine you want to move all files larger than 100MB to a Glacier bucket.

1️⃣ Generate a List of Existing Files

Enable S3 Inventory to generate a CSV file containing all objects and their sizes.

2️⃣ Filter Files by Size

Download the Inventory CSV file and filter only objects larger than 100MB.

3️⃣ Create an S3 Batch Operations Task

In the AWS S3 → Batch Operations console:

Select Create a Task.
Provide the filtered CSV file.
Choose the Copy Objects action.
Configure the destination: a bucket with a lifecycle policy to send files to Glacier.

4️⃣ Launch the Task and Monitor Execution

S3 Batch Operations automatically processes all listed objects.
AWS provides a detailed report on processed objects.

III. Comparison of AWS S3 Methods

Criteria	AWS Lambda + EventBridge	S3 Inventory + Athena	S3 Batch Operations
Reactivity	Real-time	Delayed (periodic, based on inventory frequency)	Semi-automatic (requires manual task creation)
Complexity	Medium (requires a Lambda script)	Low (simple SQL query)	Medium (requires an Inventory file or object list)
Cost	Can be high (if triggered frequently)	Low (storage + Athena query costs)	Medium (charged per executed action)
Scalability	High (continuous event handling)	Very high (can analyze millions of objects)	High (processes millions of objects)
Primary Use Case	Triggering immediate actions on specific objects	Analyzing and reporting on a large number of objects	Mass actions (copying, deleting, modifying metadata)
Advanced Filtering	Yes (via Python code)	Yes (via advanced SQL queries)	Limited (based on the provided list)
Ease of Setup	Relatively simple (requires EventBridge + Lambda setup)	Easy (requires enabling Inventory + writing SQL)	Medium (simple configuration but requires an Inventory file)

Conclusion

Filtering S3 events by object size is essential for cost optimization, automating large file processing, enhancing security, and improving monitoring. Depending on the need, different solutions exist: AWS Lambda with EventBridge, S3 Inventory with Athena, or S3 Batch Operations. Each approach offers a level of flexibility and performance suited to different use cases.

If your goal is real-time reaction, using AWS Lambda with EventBridge is recommended. If you prefer a more analytical and periodic approach, Athena with S3 Inventory is a solid solution. Finally, for automated actions on existing objects, S3 Batch Operations enables bulk file processing.

By applying these methods, you can better manage your AWS resources and optimize your S3 infrastructure.

The 5 Must-Read Whitepapers

What is a Bundle? Understanding the Concept

We work with you!