简体   繁体   English

用于 CLI 事件、AWS Kinesis + Lambda 的 Python 中的 UnicodeDecodeError

[英]UnicodeDecodeError in Python for CLI event, AWS Kinesis + Lambda

I have AWS Kinesis stream (with server-side encryption) set up, with Lambda function for event handling.我设置了 AWS Kinesis 流(使用服务器端加密),并使用 Lambda 函数进行事件处理。 I am trying to test this from CLI and get the error.我正在尝试从 CLI 对此进行测试并得到错误。

CLI for inserting event:用于插入事件的 CLI:

aws kinesis put-record \
--stream-name "dev" \
--partition-key "user_event" \
--data '
    [{
        "user_id": 2002,
        "group_id": 2002,
        "created_at": "2021-11-24T16:03:03Z",
        "user_event_id": 989898989898,
        "event_type": "in",
        "event_date": "2022-11-03T14:03:03Z",
        "category": "music",
        "is_free": true
    }]' \
--profile dev \
--region eu-central-1

Parsing code which throws the error:解析引发错误的代码:

for record in kinesis_data["Records"]:
    partition_key = record["kinesis"]["partitionKey"]

    if not self.accepted_parsers(partition_key):
        continue

    encoded_data = record["kinesis"]["data"]
    payload_string = base64.b64decode(encoded_data)

    payload = json.loads(payload_string)

Error itself:错误本身:

[ERROR] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 0: invalid start byte
Traceback (most recent call last):
  File "/var/task/events_processing.py", line 38, in lambda_handler
    status_code = lambda_interactor.call(event)
  File "/var/task/lib/lambda_interactor.py", line 113, in call
    event_dicts = self.kinesis_parser.call(payload)
  File "/var/task/lib/kinesis_parser.py", line 22, in call
    payload = json.loads(payload_string)
  File "/var/lang/lib/python3.8/json/__init__.py", line 343, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')

I tried printing to debug:我尝试打印调试:

  • encoded_data : userid2002groupid2002createdat20211124T160303Zusereventid989898989898eventtypeineventdate20221103T140303Zcategorymusicisfreetrue userid2002groupid2002createdat20211124T160303Zusereventid989898989898eventtypeineventdate20221103T140303Zcategorymusicisfreetrueencoded_data

  • payload_string : b"\x99\xe9\x9bz\xb8\x9d\xdbM6r\x89\xa6\xbax\xad\xca'v\xd3M\x9c\xad\xe6\xady\xd6\xad\xdbM\xb5\xd7]\xb8O^\xb4\xdfM\xd9z\xf7\xa7\xb5\xba\xe2\xb5\xe7\xafz{bw\xdf=\xf3\xdf=\xf3\xdf=\xf1\xeb\xde\x9e\xdbr\xa5\xe8\xa7z\xf7\xa7\xb5\xd6\xad{m6\xdb]t\xdd=x\xd3}7e\xc6\xadz\n+\xcak\xac\x89\xc8\xac~\xb7\x9e\xb6\xbb\x9e"有效载荷字符串:b"\ payload_string b"\x99\xe9\x9bz\xb8\x9d\xdbM6r\x89\xa6\xbax\xad\xca'v\xd3M\x9c\xad\xe6\xady\xd6\xad\xdbM\xb5\xd7]\xb8O^\xb4\xdfM\xd9z\xf7\xa7\xb5\xba\xe2\xb5\xe7\xafz{bw\xdf=\xf3\xdf=\xf3\xdf=\xf1\xeb\xde\x9e\xdbr\xa5\xe8\xa7z\xf7\xa7\xb5\xd6\xad{m6\xdb]t\xdd=x\xd3}7e\xc6\xadz\n+\xcak\xac\x89\xc8\xac~\xb7\x9e\xb6\xbb\x9e"

How can I fix this?我怎样才能解决这个问题? I followed examples from AWS tutorials, and this exact setup used to work properly for months, and stopped working about a month ago with no changes in code.我遵循 AWS 教程中的示例,这个确切的设置过去几个月都可以正常工作,大约一个月前停止工作,代码没有任何变化。 I see that special characters were all removed, but I have no idea why.我看到特殊字符都被删除了,但我不知道为什么。

I found out - AWS CLI v2 has undocumented behavior changes.我发现 - AWS CLI v2 有未记录的行为更改。 I had to add the following options to the CLI command:我必须在 CLI 命令中添加以下选项:

--region eu-central-1 \
--cli-binary-format raw-in-base64-out

The latter is especially important, as only this way the proper base64-encoded message is sent and received.后者尤其重要,因为只有这样才能发送和接收正确的 base64 编码消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM