Properly handling Escape Characters in Boto3

Question

I have a S3 Bucket Streaming logs to a lambda function that tags files based on some logic.

While I have worked around this issue in the past and I understand there are some characters that need to be handled I'm wondering if there is a safe way to handle this with some API or is it something I need to handle on my own.

For example I have a lambda function like so:

import boto3

def lambda_handler(event, context):
    s3 = boto3.client("s3")

    for record in event["Records"]:
        bucket = record["s3"]["bucket"]["name"]
        objectName = record["s3"]["object"]["key"]

        tags = []
        
        if "Pizza" in objectName:
            tags.append({"Key" : "Project", "Value" : "Great"})
        if "Hamburger" in objectName:
            tags.append({"Key" : "Project", "Value" : "Good"})
        if "Liver" in objectName:
            tags.append({"Key" : "Project", "Value" : "Yuck"})

        s3.put_object_tagging(
            Bucket=bucket,
            Key=objectName,
            Tagging={
                "TagSet" : tags
            }
        )

    
    return {
        'statusCode': 200,
    }

This code works great. I upload a file to s3 called Pizza-Is-Better-Than-Liver.txt then the function runs and tags the file with both Great and Yuck (sorry for the strained example).

However If I upload the file Pizza Is+AmazeBalls.txt things go sideways:

Looking at the event in CloudWatch the object key shows as: Pizza+Is%2BAmazeBalls.txt .

Obviously the space is escaped to a + and the + to a %2B when I pass that key to put_object_tagging() it fails with a NoSuchKey Error.

My question; is there a defined way to deal with escaped characters in boto3 or some other sdk, or do I just need to do it myself? I really don't and to add any modules to the function and I could just use do a contains / replace(), but it's odd I would get something back that I can't immediately use without some transformation.

I'm not uploading the files and can't mandate what they call things (i-have-tried-but-it-fails), if it's a valid Windows or Mac filename it should work (I get that is a whole other issue but I can deal with that).

Answer 1

Since no other answers I guess I post my bandaid:

def format_path(path):
    path = path.replace("+", " ")
    path = path.replace("%21", "!")
    path = path.replace("%24", "$")
    path = path.replace("%26", "&")
    path = path.replace("%27", "'")
    path = path.replace("%28", "(")
    path = path.replace("%29", ")")
    path = path.replace("%2B", "+")
    path = path.replace("%40", "@")
    path = path.replace("%3A", ":")
    path = path.replace("%3B", ";")
    path = path.replace("%2C", ",")
    path = path.replace("%3D", "=")
    path = path.replace("%3F", "?")
    return path

I'm sure there is a simpler, more complete way to do this but this seems to work... for now.

Properly handling Escape Characters in Boto3

Question

1 answers

solution1
0 2022-03-01 14:49:09

Properly handling Escape Characters in Boto3

Question

1 answers

solution1 0 2022-03-01 14:49:09

solution1
0 2022-03-01 14:49:09