简体   繁体   English

将 json 列转换为标准 pandas dataframe

[英]Convert json column into standard pandas dataframe

I have a pandas dataframe with a column in json format like below.我有一个 pandas dataframe ,其中有一列 json 格式,如下所示。

id ID date日期 gender性别 response回复
1 1 1/14/2021 2021 年 1 月 14 日 M "{'score':3,'reason':{'description':array(['a','b','c'])}" "{'score':3,'reason':{'description':array(['a','b','c'])}"
2 2 5/16/2020 2020 年 5 月 16 日 F F "{'score':4,'reason':{'description':array(['x','y','z'])}" "{'score':4,'reason':{'description':array(['x','y','z'])}"

I want to convert this into a dataframe by flattening the dictionary in the response column.我想通过展平响应列中的字典将其转换为 dataframe。 The dictionary is stored as a string in the database.字典以字符串形式存储在数据库中。

Is there an easy way in python to convert the response column into a dictionary object and then flatten it to a dataframe like this: python 中是否有一种简单的方法可以将响应列转换为字典 object,然后将其展平为 dataframe,如下所示:

id ID date日期 gender性别 score分数 description描述
1 1 1/14/2021 2021 年 1 月 14 日 M 3 3 a一个
1 1 1/14/2021 2021 年 1 月 14 日 M 3 3 b b
1 1 1/14/2021 2021 年 1 月 14 日 M 3 3 c c
2 2 5/16/2020 2020 年 5 月 16 日 F F 4 4 x X
2 2 5/16/2020 2020 年 5 月 16 日 F F 4 4 y是的
2 2 5/16/2020 2020 年 5 月 16 日 F F 4 4 z z

Given the dataframe you provided:鉴于您提供的 dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "id": [1, 2],
        "date": ["1/14/2021", "5/16/2020"],
        "gender": ["M", "F"],
        "response": [
            "{'score':3,'reason':{'description':array(['a','b','c'])}",
            "{'score':4,'reason':{'description':array(['x','y','z'])}",
        ],
    }
)

You can define a function to flatten the values in response column:您可以定义 function 以展平response列中的值:

def flatten(data, new_data):
    """Recursive helper function.

    Args:
        data: nested dictionary.
        new_data: empty dictionary.

    Returns:
        Flattened dictionary.

    """
    for key, value in data.items():
        if isinstance(value, list):
            for item in value:
                flatten(item, new_data)
        if isinstance(value, dict):
            flatten(value, new_data)
        if (
            isinstance(value, str)
            or isinstance(value, int)
            or isinstance(value, ndarray)
        ):
            new_data[key] = value
    return new_data

And then, proceed like this using Numpy ndarrays to take care of the arrays and Python standard libray eval built-in function to make dictionaries from the strings in response column: And then, proceed like this using Numpy ndarrays to take care of the arrays and Python standard libray eval built-in function to make dictionaries from the strings in response column:

import numpy as np
from numpy import ndarray

# In your example, closing curly braces are missing, hence the "+ '}'"
df["response"] = df["response"].apply(
    lambda x: flatten(eval(x.replace("array", "np.array") + "}"), {})
)

# For each row, flatten nested dict, make a dataframe of it
# and concat it with non nested columns
# Then, concat all new dataframes
new_df = pd.concat(
    [
        pd.concat(
            [
                pd.DataFrame(df.loc[idx, :]).T.drop(columns="response"),
                pd.DataFrame(df.loc[idx, "response"]).reset_index(drop=True),
            ],
            axis=1,
        ).fillna(method="ffill")
        for idx in df.index
    ]
).reset_index(drop=True)

So that:以便:

print(new_df)
# Output
   id       date gender  score description
0   1  1/14/2021      M      3           a
1   1  1/14/2021      M      3           b
2   1  1/14/2021      M      3           c
3   2  5/16/2020      F      4           y
4   2  5/16/2020      F      4           x
5   2  5/16/2020      F      4           z

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM