简体   繁体   English

将 pandas 类 json 字符串列转换为 DataFrame

[英]Convert pandas column of json-like strings to DataFrame

I have the following DataFrame that I get "as-is" from an API:我有以下 DataFrame,我从 API 中“按原样”获得:

df = pd.DataFrame({'keys': {0: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]",
                            1: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '585'}, {'strip': '10/1/2022'}]",
                            2: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '580'}, {'strip': '10/1/2022'}]",
                            3: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '545'}, {'strip': '10/1/2022'}]",
                            4: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '555'}, {'strip': '10/1/2022'}]"},
                   'value': {0: 353.3, 1: 25.8, 2: 336.65, 3: 366.05, 4: 20.8}})

>>> df
                                                keys   value
0  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  353.30
1  [{'contract': 'G'}, {'contract_type': 'P'}, {'...   25.80
2  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  336.65
3  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  366.05
4  [{'contract': 'G'}, {'contract_type': 'P'}, {'...   20.80

Each row of the "keys" column is a string (not JSON, as the values are enclosed in single quotes instead of double quotes). “键”列的每一行都是一个字符串(不是 JSON,因为值用单引号而不是双引号括起来)。 For example:例如:

>>> df.at[0, keys]
"[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]"

I would like to convert the "keys" column to a DataFrame and append it to df as new columns.我想将“键”列转换为 DataFrame 和 append 作为新列转换为df

I am currently doing:我目前正在做:

  1. Replacing single quotes with double quotes and passing to json.loads to read into a list of dictionaries with the below structure:用双引号替换单引号并传递给json.loads以读入具有以下结构的字典列表:
[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]
  1. Combining the dictionaries into a single dictionary with dictionary comprehension:通过字典理解将字典组合成一个字典:
{'contract': 'G', 'contract_type': 'C', 'strike': '560', 'strip': '10/1/2022'}
  1. apply -ing this to every row and calling the pd.DataFrame constructor on the result.将此apply到每一行并在结果上调用pd.DataFrame构造函数。
  2. join back to original df join原来的df

In a single line, my code is:在一行中,我的代码是:

>>> df.drop("keys", axis=1).join(pd.DataFrame(df["keys"].apply(lambda x: {k: v for d in json.loads(x.replace("'","\"")) for k, v in d.items()}).tolist()))

    value contract contract_type strike      strip
0  353.30        G             C    560  10/1/2022
1   25.80        G             P    585  10/1/2022
2  336.65        G             C    580  10/1/2022
3  366.05        G             C    545  10/1/2022
4   20.80        G             P    555  10/1/2022

I was wondering if there is a better way to do this.我想知道是否有更好的方法来做到这一点。

You could use ast.literal_eval (built-in) to convert the dict strings to actual dicts, and then use pd.json_normalize with record_path=[[]] to get the objects into a table format:您可以使用ast.literal_eval (内置)将字典字符串转换为实际字典,然后使用pd.json_normalizerecord_path=[[]]将对象转换为表格格式:

import ast
new_df = pd.json_normalize(df['keys'].apply(ast.literal_eval), record_path=[[]]).apply(lambda col: col.dropna().tolist())

Output: Output:

>>> new_df
  contract contract_type strike      strip
0        G             C    560  10/1/2022
1        G             P    585  10/1/2022
2        G             C    580  10/1/2022
3        G             C    545  10/1/2022
4        G             P    555  10/1/2022

An alternate solution would be to use string replacement to merge the separate dicts into one:另一种解决方案是使用字符串替换将单独的字典合并为一个:

import ast
new_df = pd.DataFrame(df['keys'].str.replace("'}, {'", "', '", regex=True).apply(ast.literal_eval).str[0].tolist())

Output: Output:


Yet another option, this one using functools.reduce (built in):还有另一种选择,这个使用functools.reduce (内置):

import ast
new_df = pd.DataFrame(df['keys'].apply(ast.literal_eval).apply(lambda row: functools.reduce(lambda x, y: x | y, row)).tolist())

You can use ast.literal_eval and ChainMap collection to merge a list of dictionaries into a single dict.您可以使用ast.literal_evalChainMap集合将字典列表合并为单个字典。

from collections import ChainMap

df['keys'] = df['keys'].apply(ast.literal_eval).apply(lambda x: dict(ChainMap(*x)))

print(df)
                                               keys   value
0  {'strip': '10/1/2022', 'strike': '560', 'contr...  353.30
1  {'strip': '10/1/2022', 'strike': '585', 'contr...   25.80
2  {'strip': '10/1/2022', 'strike': '580', 'contr...  336.65
3  {'strip': '10/1/2022', 'strike': '545', 'contr...  366.05
4  {'strip': '10/1/2022', 'strike': '555', 'contr...   20.80

Then use .apply(pd.Series) to explode a column of dictionaries into separate columns and use concat to combine it with the rest of the dataframe然后使用.apply(pd.Series)将一列字典分解为单独的列,并使用concat将其与 dataframe 的 rest 合并

df_ = pd.concat([df['keys'].apply(pd.Series), df['value']], axis=1)

print(df_)
       strip strike contract_type contract   value
0  10/1/2022    560             C        G  353.30
1  10/1/2022    585             P        G   25.80
2  10/1/2022    580             C        G  336.65
3  10/1/2022    545             C        G  366.05
4  10/1/2022    555             P        G   20.80

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM