简体   繁体   English

AWS Lambda JSON 格式化

[英]AWS Lambda JSON Formatting

I am trying to change how the output of Amazon Transcribe looks and I am doing this through a Lambda.我正在尝试更改 Amazon Transcribe 的 output 的外观,我正在通过 Lambda 进行此操作。 The Transcribe output produces a JSON file with the appropriate information, which is then stored in an S3 bucket. Transcribe output 会生成一个包含适当信息的 JSON 文件,然后将其存储在 S3 存储桶中。 This then triggers another lambda which will format it accordingly.然后这会触发另一个 lambda ,它将相应地对其进行格式化。

Currently, the JSON file can be found at the following link (it is too large to post on here: https://paste.pythondiscord.com/ebenugoqok.py目前,JSON 文件可以在以下链接中找到(它太大,无法在此处发布: https://paste.pythondiscord.com/ebenugoqok.py

And I am trying to get the Transcript so it looks like this:我正在尝试获取成绩单,所以它看起来像这样:

ch_1 :  just
ch_0 :  Hey,  Steve,  this is Mike calling ARM.  How you doing this afternoon? 
ch_1 :  I'm alright.  How are you doing? 
ch_0 :  Um,  doing well.  So,  uh,  ARM just have me reach out here today to update things on our end and make sure that we're keeping you guys up to date with the latest pricing specs and technology for 2020.  Um,  just looking to see if you're more focused on,  uh,  networking or data center initiatives right now. 
ch_1 :  Ah,  either be crazy.  But the
ch_0 :  Yeah,  I was just doing a great I gotcha.  I gotcha.  And,  um are you guys currently working with a ARM partner? 
ch_1 :  Yeah Oh yeah, 
ch_0 :  You're right. 
ch_1 :  That was one of the people that thing, 
ch_0 :  Okay.  Alright.  And,  uh,  just like an update the notes here on my end.  Um,  do you guys have any projects that are slated for,  like,  the next 12 months?  Right? 
ch_1 :  but
ch_0 :  Crap.  Okay.  Alright.  I will,  uh,  go ahead and update the nose when I went to reflect that.  Ah,  I do appreciate your time. 
ch_1 :  Alright,  thank you very much.  You have to go. 
ch_0 :  Bye.  Did you pick

I have the following code in my lambda:我的 lambda 中有以下代码:

import json
import boto3

def lambda_handler(event, context):
    if event:
        s3 = boto3.client("s3")
        s3_object = event["Records"][0]["s3"]
        bucket_name = s3_object["bucket"]["name"]
        file_name = s3_object["object"]["key"]
        file_obj = s3.get_object(Bucket=bucket_name, Key=file_name)
        transcript_result = json.loads(file_obj["Body"].read())

        channels = transcript_result["results"]["channel_labels"]
        items = transcript_result["results"]["items"]


        speaker_text = []
        flag = False

        temp = None


        with open("/tmp/transcribe.txt", "a") as x:
            for word in items:
                for seg in channels["channels"]:
                    for seg_item in seg["items"]:
                        # print(word["type"])
                        if "start_time" in seg_item and word["type"] != "punctuation":
                            if word["end_time"] == seg_item["end_time"] and word["start_time"] == seg_item["start_time"]:
                                # if word["alternatives"][0]["content"]:
                                if temp != seg["channel_label"]:
                                    x.write("\n")
                                    x.write("{} : ".format(seg["channel_label"]))
                                    speaker_text.append(word["alternatives"][0]["content"])
                                    flag = True
                                    temp = seg["channel_label"]
                                else:
                                    speaker_text.append(word["alternatives"][0]["content"])
                                    flag = True

            if word["type"] == "punctuation":
                x.write(word["alternatives"][0]["content"])
            x.write(" {}".format(' '.join(speaker_text)))

    s3.put_object(Bucket="aws-channel-separation", Key=file_name, Body=json.dumps(speaker_text))

However, the output of this is:但是,这个的 output 是:

["just", "Hey", "Steve", "this", "is", "Mike", "calling", "ARM", "How", "you", "doing", "this", "afternoon", "I'm", "alright", "How", "are", "you", "doing", "Um", "doing", "well", "So", "uh", "ARM", "just", "have", "me", "reach", "out", "here", "today", "to", "update", "things", "on", "our", "end", "and", "make", "sure", "that", "we're", "keeping", "you", "guys", "up", "to", "date", "with", "the", "latest", "pricing", "specs", "and", "technology", "for", "2020", "Um", "just", "looking", "to", "see", "if", "you're", "more", "focused", "on", "uh", "networking", "or", "data", "center", "initiatives", "right", "now", "Ah", "either", "be", "crazy", "But", "the", "Yeah", "I", "was", "just", "doing", "a", "great", "I", "gotcha", "I", "gotcha", "And", "um", "are", "you", "guys", "currently", "working", "with", "a", "ARM", "partner", "Yeah", "Oh", "yeah", "You're", "right", "That", "was", "one", "of", "the", "people", "that", "thing", "Okay", "Alright", "And", "uh", "just", "like", "an", "update", "the", "notes", "here", "on", "my", "end", "Um", "do", "you", "guys", "have", "any", "projects", "that", "are", "slated", "for", "like", "the", "next", "12", "months", "Right", "but", "Crap", "Okay", "Alright", "I", "will", "uh", "go", "ahead", "and", "update", "the", "nose", "when", "I", "went", "to", "reflect", "that", "Ah", "I", "do", "appreciate", "your", "time", "Alright", "thank", "you", "very", "much", "You", "have", "to", "go", "Bye", "Did", "you", "pick"]

Does anyone know how to get the formatting correct?有谁知道如何正确格式化? Any help would be appreciated.任何帮助,将不胜感激。

On this line, you have speaker_text formatted as JSON and stored in S3:在这一行中,您将speaker_text格式化为 JSON 并存储在 S3 中:

s3.put_object(Bucket="aws-channel-separation", Key=file_name, Body=json.dumps(speaker_text))

But it seems like you'd actually want the text that was written to /tmp/transcribe.txt .但似乎您实际上想要写入/tmp/transcribe.txt的文本。 You can do that with:你可以这样做:

s3.upload_file(Filename="/tmp/transcribe.txt", Bucket="aws-channel-separation", Key=file_name)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM