I have a dataframe of the following format
df = pd.DataFrame(
{"company":["McDonalds","Arbys","Wendys"],
"City":["Dallas","Austin","Chicago"],
"Datetime":[{"11/23/2016":"1","09/06/2011":"2"},
{"02/23/2012":"1","04/06/2013":"2"},
{"10/23/2017":"1","05/06/2019":"2"}]})
df
>>> Company City Datetime
>>> McDonalds Dallas {'11/23/2016': '1', '09/06/2011':'2'}
>>> Arbys Austin {'02/23/2012': '1', '04/06/2013':'2'}
>>> Wendys Chicago {'10/23/2017': '1', '05/06/2019':'2'}
The dictionary inside of the column "Datetime" is a string , so I must read it into a python dictionary by using ast.literal_eval
I would like to unstack the dataframe based on the values in datetime so that the output looks as follows:
df_out
>>> Company City Date Value
>>> McDonalds Dallas 11/23/2016 1
>>> McDonalds Dallas 09/06/2011 2
>>> Arbys Austin 02/23/2012 1
>>> Arbys Austin 04/06/2013 2
>>> Wendys Chicago 10/23/2017 1
>>> Wendys Chicago 05/06/2019 2
I am a bit lost on this one, I know I will need to iter over the rows and read each dictionary, so I had the idea of using df.iterrows()
and creating namedTuples of each rows values that won't change, and then looping over the dictionary itself attaching different datetime values, but I am not sure this is the most efficient way. Any tips would be appreciated.
My try:
(df.drop('Datetime', axis=1)
.merge(df.Datetime.agg(lambda x: pd.Series(x))
.stack().reset_index(-1),
left_index=True,
right_index=True
)
.rename(columns={'level_1':'Date', 0:'Value'})
)
Output:
company City Date Value
0 McDonalds Dallas 11/23/2016 1
0 McDonalds Dallas 09/06/2011 2
1 Arbys Austin 02/23/2012 1
1 Arbys Austin 04/06/2013 2
2 Wendys Chicago 10/23/2017 1
2 Wendys Chicago 05/06/2019 2
I would flatten dictionaries in Datetime
and construct a new df
from it. Finally, join back.
from itertools import chain
df1 = pd.DataFrame(chain.from_iterable(df.Datetime.map(dict.items)),
index=df.index.repeat(df.Datetime.str.len()),
columns=['Date', 'Val'])
Out[551]:
Date Val
0 11/23/2016 1
0 09/06/2011 2
1 02/23/2012 1
1 04/06/2013 2
2 10/23/2017 1
2 05/06/2019 2
df_final = df.drop('Datetime', 1).join(df1)
Out[554]:
company City Date Val
0 McDonalds Dallas 11/23/2016 1
0 McDonalds Dallas 09/06/2011 2
1 Arbys Austin 02/23/2012 1
1 Arbys Austin 04/06/2013 2
2 Wendys Chicago 10/23/2017 1
2 Wendys Chicago 05/06/2019 2
Here is a clean solution:
Solution
df = df.set_index(['company', 'City'])
df_stack = (df['Datetime'].apply(pd.Series)
.stack().reset_index()
.rename(columns= {'level_2': 'Datetime', 0: 'val'}))
Output
print(df_stack.to_string())
company City Datetime val
0 McDonalds Dallas 11/23/2016 1
1 McDonalds Dallas 09/06/2011 2
2 Arbys Austin 02/23/2012 1
3 Arbys Austin 04/06/2013 2
4 Wendys Chicago 10/23/2017 1
5 Wendys Chicago 05/06/2019 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.