[英]How to unstack column of dictionaies in pandas dataframe?
I have a dataframe of the following format我有以下格式的数据框
df = pd.DataFrame(
{"company":["McDonalds","Arbys","Wendys"],
"City":["Dallas","Austin","Chicago"],
"Datetime":[{"11/23/2016":"1","09/06/2011":"2"},
{"02/23/2012":"1","04/06/2013":"2"},
{"10/23/2017":"1","05/06/2019":"2"}]})
df
>>> Company City Datetime
>>> McDonalds Dallas {'11/23/2016': '1', '09/06/2011':'2'}
>>> Arbys Austin {'02/23/2012': '1', '04/06/2013':'2'}
>>> Wendys Chicago {'10/23/2017': '1', '05/06/2019':'2'}
The dictionary inside of the column "Datetime" is a string , so I must read it into a python dictionary by using ast.literal_eval “Datetime”列中的字典是一个字符串,因此我必须使用 ast.literal_eval 将其读入 python 字典
I would like to unstack the dataframe based on the values in datetime so that the output looks as follows:我想根据日期时间中的值取消堆叠数据帧,以便输出如下所示:
df_out
>>> Company City Date Value
>>> McDonalds Dallas 11/23/2016 1
>>> McDonalds Dallas 09/06/2011 2
>>> Arbys Austin 02/23/2012 1
>>> Arbys Austin 04/06/2013 2
>>> Wendys Chicago 10/23/2017 1
>>> Wendys Chicago 05/06/2019 2
I am a bit lost on this one, I know I will need to iter over the rows and read each dictionary, so I had the idea of using df.iterrows()
and creating namedTuples of each rows values that won't change, and then looping over the dictionary itself attaching different datetime values, but I am not sure this is the most efficient way.我对这个有点迷茫,我知道我需要遍历行并阅读每个字典,所以我有了使用df.iterrows()
并创建每行值的 namedTuples 的想法,这些值不会改变,并且然后循环字典本身附加不同的日期时间值,但我不确定这是最有效的方法。 Any tips would be appreciated.任何提示将不胜感激。
My try:我的尝试:
(df.drop('Datetime', axis=1)
.merge(df.Datetime.agg(lambda x: pd.Series(x))
.stack().reset_index(-1),
left_index=True,
right_index=True
)
.rename(columns={'level_1':'Date', 0:'Value'})
)
Output:输出:
company City Date Value
0 McDonalds Dallas 11/23/2016 1
0 McDonalds Dallas 09/06/2011 2
1 Arbys Austin 02/23/2012 1
1 Arbys Austin 04/06/2013 2
2 Wendys Chicago 10/23/2017 1
2 Wendys Chicago 05/06/2019 2
I would flatten dictionaries in Datetime
and construct a new df
from it.我会在Datetime
展平字典并从中构建一个新的df
。 Finally, join back.最后,重新加入。
from itertools import chain
df1 = pd.DataFrame(chain.from_iterable(df.Datetime.map(dict.items)),
index=df.index.repeat(df.Datetime.str.len()),
columns=['Date', 'Val'])
Out[551]:
Date Val
0 11/23/2016 1
0 09/06/2011 2
1 02/23/2012 1
1 04/06/2013 2
2 10/23/2017 1
2 05/06/2019 2
df_final = df.drop('Datetime', 1).join(df1)
Out[554]:
company City Date Val
0 McDonalds Dallas 11/23/2016 1
0 McDonalds Dallas 09/06/2011 2
1 Arbys Austin 02/23/2012 1
1 Arbys Austin 04/06/2013 2
2 Wendys Chicago 10/23/2017 1
2 Wendys Chicago 05/06/2019 2
Here is a clean solution:这是一个干净的解决方案:
Solution解决方案
df = df.set_index(['company', 'City'])
df_stack = (df['Datetime'].apply(pd.Series)
.stack().reset_index()
.rename(columns= {'level_2': 'Datetime', 0: 'val'}))
Output输出
print(df_stack.to_string())
company City Datetime val
0 McDonalds Dallas 11/23/2016 1
1 McDonalds Dallas 09/06/2011 2
2 Arbys Austin 02/23/2012 1
3 Arbys Austin 04/06/2013 2
4 Wendys Chicago 10/23/2017 1
5 Wendys Chicago 05/06/2019 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.