简体   繁体   English

将dict的pandas dataframe列展开为dataframe列

[英]Expand pandas dataframe column of dict into dataframe columns

I have a Pandas DataFrame where one column is a Series of dicts, like this:我有一个 Pandas DataFrame,其中一列是一系列字典,如下所示:

   colA  colB                                  colC
0     7     7  {'foo': 185, 'bar': 182, 'baz': 148}
1     2     8  {'foo': 117, 'bar': 103, 'baz': 155}
2     5    10  {'foo': 165, 'bar': 184, 'baz': 170}
3     3     2  {'foo': 121, 'bar': 151, 'baz': 187}
4     5     5  {'foo': 137, 'bar': 199, 'baz': 108}

I want the foo , bar and baz key-value pairs from the dicts to be columns in my dataframe, such that I end up with this:我希望字典中的foobarbaz键值对成为我的 dataframe 中的列,这样我最终得到的是:

   colA  colB  foo  bar  baz
0     7     7  185  182  148
1     2     8  117  103  155
2     5    10  165  184  170
3     3     2  121  151  187
4     5     5  137  199  108

How do I do that?我怎么做?

TL;DR TL; 博士

df = df.drop('colC', axis=1).join(pd.DataFrame(df.colC.values.tolist()))

Elaborate answer详细解答

We start by defining the DataFrame to work with, as well as a importing Pandas:我们首先定义要使用的 DataFrame 以及导入的 Pandas:

import pandas as pd


df = pd.DataFrame({'colA': {0: 7, 1: 2, 2: 5, 3: 3, 4: 5},
                   'colB': {0: 7, 1: 8, 2: 10, 3: 2, 4: 5},
                   'colC': {0: {'foo': 185, 'bar': 182, 'baz': 148},
                    1: {'foo': 117, 'bar': 103, 'baz': 155},
                    2: {'foo': 165, 'bar': 184, 'baz': 170},
                    3: {'foo': 121, 'bar': 151, 'baz': 187},
                    4: {'foo': 137, 'bar': 199, 'baz': 108}}})

The column colC is a pd.Series of dicts, and we can turn it into a pd.DataFrame by turning each dict into a pd.Series :colCpd.Series类型的字典中,我们可以把它变成一个pd.DataFrame通过转动每个字典成pd.Series

pd.DataFrame(df.colC.values.tolist())
# df.colC.apply(pd.Series). # this also works, but it is slow

which gives the pd.DataFrame :这给出了pd.DataFrame

   foo  bar  baz
0  154  190  171
1  152  130  164
2  165  125  109
3  153  128  174
4  135  157  188

So all we need to do is:所以我们需要做的就是:

  1. Turn colC into a pd.DataFramecolC变成pd.DataFrame
  2. Delete the original colC from dfdf删除原始colC
  3. Join the convert colC with df使用df加入转换colC

That can be done in a one-liner:这可以在单行中完成:

df = df.drop('colC', axis=1).join(pd.DataFrame(df.colC.values.tolist()))

With the contents of df now being the pd.DataFrame : df的内容现在是pd.DataFrame

   colA  colB  foo  bar  baz
0     2     4  154  190  171
1     4    10  152  130  164
2     4    10  165  125  109
3     3     8  153  128  174
4    10     9  135  157  188

I faced the same challenge recently and I managed to do it manually using apply and join .我最近遇到了同样的挑战,我设法使用applyjoin手动完成。

import pandas as pd

def expand_dict_column(df: pd.DataFrame, column) -> pd.DataFrame:
    df.drop(columns=[column], inplace=False).join(
        df.apply(lambda x: pd.Series(x[column].values(), index=x[column].keys()), axis=1))

In the case of the columns of the question it would look like this:对于问题的列,它看起来像这样:

df.drop(columns=["colC"], inplace=False).join(
    df.apply(lambda x: pd.Series(x["colC"].values(), index=x["colC"].keys()), axis=1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM