繁体   English   中英

如何在 pandas dataframe 的多列中展平字典列表

[英]How to flatten list of dictionaries in multiple columns of pandas dataframe

我有一个 dataframe,每条记录都存储这样的字典列表:

row prodect_id recommend_info
  0 XQ002      [{"recommend_key":"XXX567","recommend_point":50},
                {"recommend_key":"XXX236","recommend_point":20},
                {"recommend_key":"XXX090","recommend_point":35}]
  1 XQ003      [{"recommend_key":"XXX089","recommend_point":30},
                {"recommend_key":"XXX567","recommend_point":20}]

我想展平字典列表,这样它看起来像这样

row prodect_id recommend_info_recommend_key recommend_info_recommend_point
  0 XQ002      XXX567                       50
  1 XQ002      XXX236                       20
  2 XQ002      XXX090                       35
  3 XQ003      XXX089                       30
  4 XQ003      XXX567                       20

我知道如何只将一个字典列表转换为 dataframe。像这样:

d = [{"recommend_key":"XXX089","recommend_point":30},
     {"recommend_key":"XXX567","recommend_point":20}]

df = pd.DataFrame(d)

row recommend_key recommend_point
  0 XXX089        30
  1 XXX567        20

但是我不知道如何对 dataframe 执行此操作,当有一列存储字典列表,或者有多列存储字典列表时

row  col_a  col_b                  col_c
  0  B001   [{"a":"b"},{"a":"c"}]  [{"y":11},{"a":"c"}]
  1  D009   [{"c":"o"},{"g":"c"}]  [{"y":11},{"a":"c"},{"l":"c"}]   
  2  G068   [{"c":"b"},{"a":"c"}]  [{"a":56},{"d":"c"}]
  3  C004   [{"d":"a"},{"b":"c"}]  [{"c":22},{"a":"c"},{"b":"c"}]
  4  F011   [{"h":"u"},{"d":"c"}]  [{"h":27},{"d":"c"}]

我有一个包含多列的数据框。 其中一列包含一个列表,每个列表中有一个字典。 我需要展开字典,然后将其附加到它来自的同一行。 里卡多的回答主要对我有用。 我在下面对其进行了概括:

def explode_column_from_list_dict(df_in, column_name_to_explode):
    df = df_in.copy()
    df = pd.concat(
        [
            df.explode(column_name_to_explode).drop([column_name_to_explode], axis=1),
            df.explode(column_name_to_explode)[column_name_to_explode].apply(pd.Series),
        ],
        axis=1,
    )
    return df

尝试:

pd.concat([df.explode('recommend_info').drop(['recommend_info'], axis=1),
           df.explode('recommend_info')['recommend_info'].apply(pd.Series)],
          axis=1)

您可以对每一列一遍又一遍地做同样的事情

这是一个例子:

>>> df = pd.DataFrame({'a': [[{3: 4, 5: 6}, {3:8, 5: 1}],
...                          [{3:2, 5:4}, {3: 8, 5: 10}]],
...                    'b': ['X', "Y"]})
>>> df
                               a  b
0   [{3: 4, 5: 6}, {3: 8, 5: 1}]  X
1  [{3: 2, 5: 4}, {3: 8, 5: 10}]  Y
>>> df = pd.concat([df.explode('a').drop(['a'], axis=1),
...                 df.explode('a')['a'].apply(pd.Series)],
...                axis=1)
>>> df
   b  3   5
0  X  4   6
0  X  8   1
1  Y  2   4
1  Y  8  10

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM