[英]Extracting text from a nested JSON file where each JSON object has variable number of entries in Python
我有一個包含多個嵌套json對象的json文件,如下所示:
{
"coordinates": null,
"acoustic_features": {
"instrumentalness": "0.00479",
"liveness": "0.18",
"speechiness": "0.0294",
"danceability": "0.634",
"valence": "0.342",
"loudness": "-8.345",
"tempo": "125.044",
"acousticness": "0.00035",
"energy": "0.697",
"mode": "1",
"key": "6"
},
"artist_id": "b2980c722a1ace7a30303718ce5491d8",
"place": null,
"geo": null,
"tweet_lang": "en",
"source": "Share.Radionomy.com",
"track_title": "8eeZ",
"track_id": "cd52b3e5b51da29e5893dba82a418a4b",
"artist_name": "Dominion",
"entities": {
"hashtags": [{
"text": "nowplaying",
"indices": [0, 11]
}, {
"text": "goth",
"indices": [51, 56]
}, {
"text": "deathrock",
"indices": [57, 67]
}, {
"text": "postpunk",
"indices": [68, 77]
}],
"symbols": [],
"user_mentions": [],
"urls": [{
"indices": [28, 50],
"expanded_url": "cathedral13.com/blog13",
"display_url": "cathedral13.com/blog13",
"url": "t.co/Tatf4hEVkv"
}]
},
"created_at": "2014-01-01 05:54:21",
"text": "#nowplaying Dominion - 8eeZ Tatf4hEVkv #goth #deathrock #postpunk",
"user": {
"location": "middle of nowhere",
"lang": "en",
"time_zone": "Central Time (US & Canada)",
"name": "Cathedral 13",
"entities": null,
"id": 81496937,
"description": "I\u2019m a music junkie who is currently responsible for
Cathedral 13 internet radio (goth, deathrock, post-punk)which has been
online since 06/20/02."
},
"id": 418243774842929150
}
每個對象都包含可變數量的主題標簽。 我想獲取一個包含#標簽文本的csv文件。 我編寫了以下代碼來做到這一點:
import csv
with open('jsonpart.json') as data_file:
data = json.load(data_file)
#print (data)
header = ['hashtags']
# open a file for writing
data_csv = open('hashtags.csv', 'wb')
# create the csv writer object
csvwriter = csv.writer(data_csv)
# write the csv header
csvwriter.writerow(header)
for entry in data:
csvwriter.writerow([entry['entities']['hashtags']])
data_csv.close()
我得到以下輸出文件:
"[{u'indices': [0, 11], u'text': u'nowplaying'}, {u'indices': [51, 56],
u'text': u'goth'}, {u'indices': [57, 67], u'text': u'deathrock'},
{u'indices': [68, 77], u'text': u'postpunk'}]"
"[{u'indices': [23, 34], u'text': u'NowPlaying'}, {u'indices': [75, 79],
u'text': u'80s'}, {u'indices': [80, 86], u'text': u'Retro'}, {u'indices':
[87, 91], u'text': u'Fun'}]"
"[{u'indices': [0, 11], u'text': u'nowplaying'}]"
"[{u'indices': [54, 65], u'text': u'nowplaying'}, {u'indices': [66, 77],
u'text': u'listenlive'}]"
我被困在這里。 我如何以以下方式獲取目標文件:
nowplaying
goth
deathrock
postpunk
NowPlaying
80's
Retro
Fun
nowplaying
nowplaying
listenlive
您可以使用簡單的列表理解。 假設您有一個名為json_chunk的json對象,則可以這樣創建列表:
text_list = [hashtag['text'] for hashtag in json_chunk['entities']['hashtags']
]
現在您有了一個清單。 迭代它(某些元素顯然具有換行符,而其他元素則沒有-因此將其全部剝離並向所有行添加換行符),然后將每個元素寫入文件,如下所示:
with open(r'C:\outputfile.csv', 'a', encoding='utf-8') as fd:
for line in text_list:
fd.write(line.strip()+'\n')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.