简体   繁体   English

如何将嵌套字典转换为Pandas DataFrame?

[英]How to convert nested dictionaries to a pandas DataFrame?

I want to transform the result of a call from an API to a dataframe. 我想将调用的结果从API转换为数据框。 The result of the API call is a nested dictionary, but the produced dataframe is not as I need it. API调用的结果是一个嵌套的字典,但是生成的数据帧不是我所需要的。

In addition to json_normalize, I tried pd.DataFrame.from_dict. 除了json_normalize之外,我还尝试了pd.DataFrame.from_dict。 However, until now had been unsuccessful. 但是,直到现在一直没有成功。 I also tried to flatten the dictionary, but nothing. 我也尝试拼合字典,但是什么也没做。

I used the following call: 我使用了以下电话:

[73] results = requests.get(url).json()
results

And the output was: 输出为:

{'result': {'totalrows': 3124,
  'rows': [{'rownum': 1,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1089387}]},
   {'rownum': 2,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1086893}]},
   {'rownum': 3,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1085803}]}

Then to produce the data frame, I used the following code: 然后使用以下代码生成数据帧:


[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()

This was the output: 这是输出:

  rownum    values
0   1      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1089387} 

1   2      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1086893}

2   3      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field': 
           'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'}, 
           {'field': 'companyname', 'value': 'APPLE INC'}, {'field': 
           'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}

However, I want to obtain a DataFrame with the following format: 但是,我想获取具有以下格式的DataFrame:

所需的DataFrame格式

You can use pandas.DataFrame.from_dict but you need to remove all unnecessary data in your data. 您可以使用pandas.DataFrame.from_dict但需要删除数据中所有不必要的数据。 Actually, you only want to keep the field value and value per row. 实际上,您只想保留field值和每行value You can do it with list comprehension: 您可以通过列表理解来做到这一点:

data = [{ field["field"]:field["value"] for field in row['values']
                    } for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1089387},
# {
#     'querydate': '7/31/2019 3:19 PM',
#     'issueid': 472683,
#     'ticker': 'AAPL',
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1086893},
# {
#     'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC', 
#     'issuetitle': 'COM', 
#     'filerid': 1085803
# }]

Once you have this dictionary, you can call from_dict method: 一旦有了该字典,就可以调用from_dict方法:

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM   AAPL

If you want to get the rownum as a column (or index): 如果你想获得rownum作为列(或指数):

data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate  rownum ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM       1   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM       2   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM       3   AAPL

Naive nested for loop attempt... 天真的嵌套循环尝试...

import pandas as pd

df = pd.DataFrame([])

for row in json["result"]["rows"]:
    rownum = row["rownum"]
    querydate = issueid = ticker = companyname = issuetitle = filerid = None
    for value_dict in row["values"]:
        if value_dict["field"] == "querydate":
            querydate = value_dict["value"]
        elif value_dict["field"] == "issueid":
            issueid = value_dict["value"]
        elif value_dict["field"] == "ticker":
            ticker = value_dict["value"]
        elif value_dict["field"] == "companyname":
            companyname = value_dict["value"]
        elif value_dict["field"] == "filerid":
            filerid = value_dict["value"]
    df = df.append(pd.DataFrame({"rownum": rownum,
                                 "querydate": querydate,
                                 "issueid": issueid,
                                 "ticker": ticker,
                                 "companyname": companyname,
                                 "issuetitle": issuetitle,
                                 "filerid": filerid,
                                }, index=[0]), ignore_index=True)

print(df)

JSON object: JSON对象:

json = {
    "result": {
        "totalrows": 3,
        "rows": [
            {
                "rownum": 1,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1089387
                    }
                ]
            },
            {
                "rownum": 2,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1086893
                    }
                ]
            },
            {
                "rownum": 3,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1085803
                    }
                ]
            }
        ]
    }
}

Output: 输出:

   rownum          querydate  issueid ticker companyname issuetitle  filerid
0       1  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1089387
1       2  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1086893
2       3  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1085803

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM