從 python 中的 json 數組獲取嵌套對象時遇到問題

Question

您好，我有一個 jsonLines 文件，我正在嘗試從此處的 jsonline 文件中獲取所有主題標簽（以及應該是相同過程的提及）： https://github.com/THsTestingGround/JsonL_Quest_SO/blob/master/output-2020- 01-21.jsonl （所以不允許我放網址，而且有很多）

這是一個獲取單個鍵對象的可重現示例。 我將如何繼續獲得多個主題標簽（提及將相同）？ 目前我必須手動指定。 無論如何要把它們都放在一個 go 之類的東西里？ 我可以在這里使用此代碼獲得 csv ：

import json
import csv
import io

# creates a .csv file using a Twitter .json file
# the fields have to be set manually

def extract_json(fileobj):

    # Iterates over an open JSONL file and yields
    # decoded lines.  Closes the file once it has been
    # read completely.

    with fileobj:
        for line in fileobj:
            yield json.loads(line)

#path to the jsonl file
data_json = io.open('output-2020-01-21.json', mode='r', encoding='utf-8') # Opens in the JSONL file
data_python = extract_json(data_json)

csv_out = io.open('tweets_out_utf8.csv', mode='w', encoding='utf-8') #opens csv file

#if you're adding additional columns please don't forget to add them here
fields = u'created_at,text,full_text, screen_name,followers,friends,rt,fav' #field names
csv_out.write(fields)
csv_out.write(u'\n')

for line in data_python:

    #because retweet is not common, sometimes jsonl won't have the key, so this is safer
    try:
        retweeted_status_full_text = '"' +line.get('retweeted_status').get('full_text').replace('"','""') + '"'
    except:
        retweeted_status_full_text = 'NA'
    #gets me only one hastags even when there are more than one
    try:
        entities= '"' + line.get('entities').get('hashtags')[0].get('text').replace('"', '""') + '"'
    except:
        entities = 'NA'

    #writes a row and gets the fields from the json object
    #screen_name and followers/friends are found on the second level hence two get methods
    row = [line.get('created_at'),
           '"' + line.get('full_text').replace('"','""') + '"', #creates double quotes
           retweeted_status_full_text,
           line.get('user').get('screen_name'),
           str(line.get('user').get('followers_count')),
           str(line.get('user').get('friends_count')),
           str(line.get('retweet_count')),
           str(line.get('favorite_count'))]



    row_joined = u','.join(row)
    csv_out.write(row_joined)
    csv_out.write(u'\n')

csv_out.close()

我確實嘗試過，但它給了我一個錯誤。 我似乎也無法在 SO 中找到解決方案。 目前 json 稍弱一些，因此我將不勝感激。 謝謝。

Answer 1


import json
import csv
import io

def extract_json(fileobj):
    with fileobj:
        for line in fileobj:
            yield json.loads(line)

data_json = io.open('a.json', mode='r', encoding='utf-8')
data_python = extract_json(data_json)

csv_out = io.open('tweets_out_utf8.csv', mode='w', encoding='utf-8')

fields = u'created_at,text,full_text, screen_name,followers,friends,rt,fav' 
csv_out.write(fields)
csv_out.write(u'\n')

for line in data_python:

    try:
        retweeted_status_full_text = '"' +line.get('retweeted_status').get('full_text').replace('"','""') + '"'
    except:
        retweeted_status_full_text = 'NA'

    try:
      temp = line.get('entities').get('hashtags')
      entities = ""
      for val in temp:
        entities += '"' + val.get('text').replace('"', '""') + '"' + ' '
    except:
      entities = ""

    row = [line.get('created_at'),
           '"' + line.get('full_text').replace('"','""') + '"',
           retweeted_status_full_text,
           line.get('user').get('screen_name'),
           str(line.get('user').get('followers_count')),
           str(line.get('user').get('friends_count')),
           str(line.get('retweet_count')),
           str(line.get('favorite_count'))]

    print('entities' + ' ' + str(entities))

    row_joined = u','.join(row)
    csv_out.write(row_joined)
    csv_out.write(u'\n')

csv_out.close()

我嘗試過這樣的事情。 我用entities = ''替換了空實體

從 python 中的 json 數組獲取嵌套對象時遇到問題

問題描述

1 個解決方案

解決方案1
1 已采納 2020-05-21 16:44:16

從 python 中的 json 數組獲取嵌套對象時遇到問題

問題描述

1 個解決方案

解決方案1 1 已采納 2020-05-21 16:44:16

解決方案1
1 已采納 2020-05-21 16:44:16