简体   繁体   中英

Json problems in (AWS lambda, formatting and line by line)

The idea:

  • I want to send some json tweets (from file) to AWS kinesis, S3, lambda, AWS comprehend then S3 again.

The sample tweets:

    [
 {
   "tweet_id": 5675880,
   "airline": "Delta",
   "name": "JetBlueNews",
   "text": "@JetBlue's new CEO seeks the right balance to please passengers and Wall ... - Greenfield Daily ",
   "tweet_coord": [
      null,
      null
   ],
   "tweet_created": "16-02-15 23:36",
   "tweet_location": "USA",
   "user_timezone": "Sydney"
},
 {
   "tweet_id": 5675881,
   "airline": "Delta",
   "name": "nesi_1992",
   "text": "@JetBlue is REALLY getting on my nerves !! 😡😡 #nothappy",
   "tweet_coord": [
      null,
      null
   ],
   "tweet_created": "16-02-15 23:43",
   "tweet_location": "undecided",
   "user_timezone": "Pacific Time (US & Canada)"
},

1- I create a python script to generate the above tweets to kinesis this is the code and the sample output:

the code: Note: Am just using one values def put_to_stream(thing_id, property_value, property_timestamp):

#    payload = {
#                'prop': str(property_value),
#                'timestamp': str(property_timestamp),
#                'thing_id': thing_id
#              }


    payload= {
      "tweet_id": 5676295,
      "airline": "US Airways",
      "name": "liquidfox1",
      "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed",
      "tweet_coord": [
         'null',
          'null'
      ],
      "tweet_created": "17-02-15 11:13",
      "tweet_location": "This is an AD account. 18+",
      "user_timezone": ""
    },

    print (payload)

    put_response = kc.put_record(
                        StreamName=my_stream_name,
                        Data=json.dumps(payload),
                        PartitionKey=thing_id)

while True:
    property_value = random.randint(40, 120)
    property_timestamp = calendar.timegm(datetime.utcnow().timetuple())
    thing_id = str(random.randint(40, 120)) #'aa-bb'

    put_to_stream(thing_id, property_value, property_timestamp)

    # wait for 5 second
    time.sleep(1)

the output:

({'tweet_id': 5676295, 'airline': 'US Airways', 'name': 'liquidfox1', 'text': '@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed', 'tweet_coord': ['null', 'null'], 'tweet_created': '17-02-15 11:13', 'tweet_location': 'This is an AD account. 18+', 'user_timezone': ''},)

the problem no in my lambda function python code: I want to extract the text from the json and pass it to comprehend to get the result "sentiment"

this is the code I try to read just one file with this code:

objectf = s3.Object(bucket_name, in_key_name).get()["Body"].read().decode('utf-8')

this is the output:

[{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}][{"tweet_id": 5676295, "airline": "US Airways", "name": "liquidfox1", "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed", "tweet_coord": ["null", "null"], "tweet_created": "17-02-15 11:13", "tweet_location": "This is an AD account. 18+", "user_timezone": ""}]

then when I try to convert it using "json.dumps" I get this output:

"[{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}][{\"tweet_id\": 5676295, \"airline\": \"US Airways\", \"name\": \"liquidfox1\", \"text\": \"@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed\", \"tweet_coord\": [\"null\", \"null\"], \"tweet_created\": \"17-02-15 11:13\", \"tweet_location\": \"This is an AD account. 18+\", \"user_timezone\": \"\"}]"

I don't know the problem for this codes.

You have a trailing comma in the assignment of payload . Remove it and everything should work:

#    payload = {
#                'prop': str(property_value),
#                'timestamp': str(property_timestamp),
#                'thing_id': thing_id
#              }


    payload= {
      "tweet_id": 5676295,
      "airline": "US Airways",
      "name": "liquidfox1",
      "text": "@USAirways me too. In the future, have a better harsh weather preparedness plan. So much of your staff called out that everything snowballed",
      "tweet_coord": [
         'null',
          'null'
      ],
      "tweet_created": "17-02-15 11:13",
      "tweet_location": "This is an AD account. 18+",
      "user_timezone": ""
    }

    print (payload)

    put_response = kc.put_record(
                        StreamName=my_stream_name,
                        Data=json.dumps(payload),
                        PartitionKey=thing_id)

while True:
    property_value = random.randint(40, 120)
    property_timestamp = calendar.timegm(datetime.utcnow().timetuple())
    thing_id = str(random.randint(40, 120)) #'aa-bb'

    put_to_stream(thing_id, property_value, property_timestamp)

    # wait for 5 second
    time.sleep(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM