简体   繁体   English

如何在pandas数据框中解开字典列?

[英]How to unstack column of dictionaies in pandas dataframe?

I have a dataframe of the following format我有以下格式的数据框

df = pd.DataFrame(
                 {"company":["McDonalds","Arbys","Wendys"],
                  "City":["Dallas","Austin","Chicago"],
                  "Datetime":[{"11/23/2016":"1","09/06/2011":"2"},
                              {"02/23/2012":"1","04/06/2013":"2"},
                              {"10/23/2017":"1","05/06/2019":"2"}]})
df
>>>    Company    City             Datetime
>>>    McDonalds  Dallas  {'11/23/2016': '1', '09/06/2011':'2'}
>>>    Arbys      Austin  {'02/23/2012': '1',  '04/06/2013':'2'}
>>>    Wendys     Chicago {'10/23/2017': '1',  '05/06/2019':'2'}

The dictionary inside of the column "Datetime" is a string , so I must read it into a python dictionary by using ast.literal_eval “Datetime”列中的字典是一个字符串,因此我必须使用 ast.literal_eval 将其读入 python 字典

I would like to unstack the dataframe based on the values in datetime so that the output looks as follows:我想根据日期时间中的值取消堆叠数据帧,以便输出如下所示:

df_out

>>>    Company    City    Date            Value
>>>    McDonalds  Dallas  11/23/2016      1
>>>    McDonalds  Dallas  09/06/2011      2
>>>    Arbys      Austin  02/23/2012      1
>>>    Arbys      Austin  04/06/2013      2
>>>    Wendys     Chicago 10/23/2017      1
>>>    Wendys     Chicago 05/06/2019      2

I am a bit lost on this one, I know I will need to iter over the rows and read each dictionary, so I had the idea of using df.iterrows() and creating namedTuples of each rows values that won't change, and then looping over the dictionary itself attaching different datetime values, but I am not sure this is the most efficient way.我对这个有点迷茫,我知道我需要遍历行并阅读每个字典,所以我有了使用df.iterrows()并创建每行值的 namedTuples 的想法,这些值不会改变,并且然后循环字典本身附加不同的日期时间值,但我不确定这是最有效的方法。 Any tips would be appreciated.任何提示将不胜感激。

My try:我的尝试:

(df.drop('Datetime', axis=1)
  .merge(df.Datetime.agg(lambda x: pd.Series(x))
           .stack().reset_index(-1),
         left_index=True, 
         right_index=True
        )
   .rename(columns={'level_1':'Date', 0:'Value'})
)

Output:输出:

     company     City        Date Value
0  McDonalds   Dallas  11/23/2016     1
0  McDonalds   Dallas  09/06/2011     2
1      Arbys   Austin  02/23/2012     1
1      Arbys   Austin  04/06/2013     2
2     Wendys  Chicago  10/23/2017     1
2     Wendys  Chicago  05/06/2019     2

I would flatten dictionaries in Datetime and construct a new df from it.我会在Datetime展平字典并从中构建一个新的df Finally, join back.最后,重新加入。

from itertools import chain
df1 = pd.DataFrame(chain.from_iterable(df.Datetime.map(dict.items)), 
                   index=df.index.repeat(df.Datetime.str.len()), 
                   columns=['Date', 'Val'])

Out[551]:
         Date Val
0  11/23/2016   1
0  09/06/2011   2
1  02/23/2012   1
1  04/06/2013   2
2  10/23/2017   1
2  05/06/2019   2

df_final = df.drop('Datetime', 1).join(df1)

Out[554]:
     company     City        Date Val
0  McDonalds   Dallas  11/23/2016   1
0  McDonalds   Dallas  09/06/2011   2
1      Arbys   Austin  02/23/2012   1
1      Arbys   Austin  04/06/2013   2
2     Wendys  Chicago  10/23/2017   1
2     Wendys  Chicago  05/06/2019   2

Here is a clean solution:这是一个干净的解决方案:

Solution解决方案

df = df.set_index(['company', 'City'])
df_stack = (df['Datetime'].apply(pd.Series)
            .stack().reset_index()
           .rename(columns= {'level_2': 'Datetime', 0: 'val'}))

Output输出

print(df_stack.to_string())

     company     City    Datetime val
0  McDonalds   Dallas  11/23/2016   1
1  McDonalds   Dallas  09/06/2011   2
2      Arbys   Austin  02/23/2012   1
3      Arbys   Austin  04/06/2013   2
4     Wendys  Chicago  10/23/2017   1
5     Wendys  Chicago  05/06/2019   2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM