[英]Pandas: Retrieving nested data from JSON File
我正在從這里解析嵌套的JSON數據。 該文件中的某些文件具有多個與它們關聯的committee_id
。 我需要與每個文件相關的所有委員會。 我不確定,但是我想那將意味着為每個committee_id
編寫新行。 我的代碼如下:
import os.path
import csv
import json
path = '/home/jayaramdas/anaconda3/Thesis/govtrack/bills109/hr'
dirs = os.listdir(path)
outputfile = open('df/h109_s_b', 'w', newline='')
outputwriter = csv.writer(outputfile)
for dir in dirs:
with open(path + "/" + dir + "/data.json", "r") as f:
data = json.load(f)
a = data['introduced_at']
b = data['bill_id']
c = data['sponsor']['thomas_id']
d = data['sponsor']['state']
e = data['sponsor']['name']
f = data['sponsor']['type']
i = data['subjects_top_term']
j = data['official_title']
if data['committees']:
g = data['committees'][0]['committee_id']
else:
g = "None"
outputwriter.writerow([a, b, c, d, e, f, g, i, j])
outputfile.close()
我遇到的問題是我的代碼僅收集列出的第一個committee_id
。 例如,文件hr145
如下所示:
"committees": [
{
"activity": [
"referral",
"in committee"
],
"committee": "House Transportation and Infrastructure",
"committee_id": "HSPW"
},
{
"activity": [
"referral"
],
"committee": "House Transportation and Infrastructure",
"committee_id": "HSPW",
"subcommittee": "Subcommittee on Economic Development, Public Buildings and Emergency Management",
"subcommittee_id": "13"
},
{
"activity": [
"referral",
"in committee"
],
"committee": "House Financial Services",
"committee_id": "HSBA"
},
{
"activity": [
"referral"
],
"committee": "House Financial Services",
"committee_id": "HSBA",
"subcommittee": "Subcommittee on Domestic and International Monetary Policy, Trade, and Technology",
"subcommittee_id": "19"
}
這是它是一個有點棘手,因為我也想subcommittee_id
與相關committee_id
結賬的時候被傳遞給一個小組委員會:
bill_iid committee subcommittee introduced at Thomas_id state name
hr145-109 HSPW na "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSPW 13 "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSBA na "2005-01-4" 73 NY "McHugh, John M."
hr145-109 HSBA 19 "2005-01-4" 73 NY "McHugh, John M."
有任何想法嗎?
您可以這樣操作:
In [111]: with open(fn) as f:
.....: data = ujson.load(f)
.....:
In [112]: committees = pd.io.json.json_normalize(data, 'committees')
In [113]: committees
Out[113]:
activity committee committee_id subcommittee subcommittee_id
0 [referral] House Energy and Commerce HSIF NaN NaN
1 [referral] House Energy and Commerce HSIF Subcommittee on Energy and Air Quality 03
2 [referral] House Education and the Workforce HSED NaN NaN
3 [referral] House Financial Services HSBA NaN NaN
4 [referral] House Agriculture HSAG NaN NaN
5 [referral, markup] House Resources HSII NaN NaN
6 [referral] House Science HSSY NaN NaN
7 [referral] House Ways and Means HSWM NaN NaN
8 [referral] House Transportation and Infrastructure HSPW NaN NaN
更新:
如果要將所有數據都放在一個DF中,可以通過以下方式進行操作:
import os
import ujson
import pandas as pd
start_path = '/home/jayaramdas/anaconda3/Thesis/govtrack/bills109/hr'
def get_merged_json(start_path):
return [ujson.load(open(os.path.join(path, f)))
for p, _, files in os.walk(start_path)
for f in files
if f.endswith('.json')
]
df = pd.read_json(ujson.dumps(data))
PS它將所有committees
作為JSON數據放在一列中
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.