[英]How to convert nested dictionaries to a pandas DataFrame?
I want to transform the result of a call from an API to a dataframe. 我想将调用的结果从API转换为数据框。 The result of the API call is a nested dictionary, but the produced dataframe is not as I need it. API调用的结果是一个嵌套的字典,但是生成的数据帧不是我所需要的。
In addition to json_normalize, I tried pd.DataFrame.from_dict. 除了json_normalize之外,我还尝试了pd.DataFrame.from_dict。 However, until now had been unsuccessful. 但是,直到现在一直没有成功。 I also tried to flatten the dictionary, but nothing. 我也尝试拼合字典,但是什么也没做。
I used the following call: 我使用了以下电话:
[73] results = requests.get(url).json()
results
And the output was: 输出为:
{'result': {'totalrows': 3124,
'rows': [{'rownum': 1,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1089387}]},
{'rownum': 2,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1086893}]},
{'rownum': 3,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1085803}]}
Then to produce the data frame, I used the following code: 然后使用以下代码生成数据帧:
[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()
This was the output: 这是输出:
rownum values
0 1 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1089387}
1 2 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1086893}
2 3 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field':
'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'}, {'field':
'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}
However, I want to obtain a DataFrame with the following format: 但是,我想获取具有以下格式的DataFrame:
You can use pandas.DataFrame.from_dict
but you need to remove all unnecessary data in your data. 您可以使用pandas.DataFrame.from_dict
但需要删除数据中所有不必要的数据。 Actually, you only want to keep the field
value and value
per row. 实际上,您只想保留field
值和每行value
。 You can do it with list comprehension: 您可以通过列表理解来做到这一点:
data = [{ field["field"]:field["value"] for field in row['values']
} for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1089387},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1086893},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1085803
# }]
Once you have this dictionary, you can call from_dict
method: 一旦有了该字典,就可以调用from_dict
方法:
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM AAPL
If you want to get the rownum
as a column (or index): 如果你想获得rownum
作为列(或指数):
data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate rownum ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM 1 AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM 2 AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM 3 AAPL
Naive nested for loop attempt... 天真的嵌套循环尝试...
import pandas as pd
df = pd.DataFrame([])
for row in json["result"]["rows"]:
rownum = row["rownum"]
querydate = issueid = ticker = companyname = issuetitle = filerid = None
for value_dict in row["values"]:
if value_dict["field"] == "querydate":
querydate = value_dict["value"]
elif value_dict["field"] == "issueid":
issueid = value_dict["value"]
elif value_dict["field"] == "ticker":
ticker = value_dict["value"]
elif value_dict["field"] == "companyname":
companyname = value_dict["value"]
elif value_dict["field"] == "filerid":
filerid = value_dict["value"]
df = df.append(pd.DataFrame({"rownum": rownum,
"querydate": querydate,
"issueid": issueid,
"ticker": ticker,
"companyname": companyname,
"issuetitle": issuetitle,
"filerid": filerid,
}, index=[0]), ignore_index=True)
print(df)
JSON object: JSON对象:
json = {
"result": {
"totalrows": 3,
"rows": [
{
"rownum": 1,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1089387
}
]
},
{
"rownum": 2,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1086893
}
]
},
{
"rownum": 3,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1085803
}
]
}
]
}
}
Output: 输出:
rownum querydate issueid ticker companyname issuetitle filerid
0 1 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1089387
1 2 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1086893
2 3 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1085803
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.