简体   繁体   中英

How to unstack column of dictionaies in pandas dataframe?

I have a dataframe of the following format

df = pd.DataFrame(
                 {"company":["McDonalds","Arbys","Wendys"],
                  "City":["Dallas","Austin","Chicago"],
                  "Datetime":[{"11/23/2016":"1","09/06/2011":"2"},
                              {"02/23/2012":"1","04/06/2013":"2"},
                              {"10/23/2017":"1","05/06/2019":"2"}]})
df
>>>    Company    City             Datetime
>>>    McDonalds  Dallas  {'11/23/2016': '1', '09/06/2011':'2'}
>>>    Arbys      Austin  {'02/23/2012': '1',  '04/06/2013':'2'}
>>>    Wendys     Chicago {'10/23/2017': '1',  '05/06/2019':'2'}

The dictionary inside of the column "Datetime" is a string , so I must read it into a python dictionary by using ast.literal_eval

I would like to unstack the dataframe based on the values in datetime so that the output looks as follows:

df_out

>>>    Company    City    Date            Value
>>>    McDonalds  Dallas  11/23/2016      1
>>>    McDonalds  Dallas  09/06/2011      2
>>>    Arbys      Austin  02/23/2012      1
>>>    Arbys      Austin  04/06/2013      2
>>>    Wendys     Chicago 10/23/2017      1
>>>    Wendys     Chicago 05/06/2019      2

I am a bit lost on this one, I know I will need to iter over the rows and read each dictionary, so I had the idea of using df.iterrows() and creating namedTuples of each rows values that won't change, and then looping over the dictionary itself attaching different datetime values, but I am not sure this is the most efficient way. Any tips would be appreciated.

My try:

(df.drop('Datetime', axis=1)
  .merge(df.Datetime.agg(lambda x: pd.Series(x))
           .stack().reset_index(-1),
         left_index=True, 
         right_index=True
        )
   .rename(columns={'level_1':'Date', 0:'Value'})
)

Output:

     company     City        Date Value
0  McDonalds   Dallas  11/23/2016     1
0  McDonalds   Dallas  09/06/2011     2
1      Arbys   Austin  02/23/2012     1
1      Arbys   Austin  04/06/2013     2
2     Wendys  Chicago  10/23/2017     1
2     Wendys  Chicago  05/06/2019     2

I would flatten dictionaries in Datetime and construct a new df from it. Finally, join back.

from itertools import chain
df1 = pd.DataFrame(chain.from_iterable(df.Datetime.map(dict.items)), 
                   index=df.index.repeat(df.Datetime.str.len()), 
                   columns=['Date', 'Val'])

Out[551]:
         Date Val
0  11/23/2016   1
0  09/06/2011   2
1  02/23/2012   1
1  04/06/2013   2
2  10/23/2017   1
2  05/06/2019   2

df_final = df.drop('Datetime', 1).join(df1)

Out[554]:
     company     City        Date Val
0  McDonalds   Dallas  11/23/2016   1
0  McDonalds   Dallas  09/06/2011   2
1      Arbys   Austin  02/23/2012   1
1      Arbys   Austin  04/06/2013   2
2     Wendys  Chicago  10/23/2017   1
2     Wendys  Chicago  05/06/2019   2

Here is a clean solution:

Solution

df = df.set_index(['company', 'City'])
df_stack = (df['Datetime'].apply(pd.Series)
            .stack().reset_index()
           .rename(columns= {'level_2': 'Datetime', 0: 'val'}))

Output

print(df_stack.to_string())

     company     City    Datetime val
0  McDonalds   Dallas  11/23/2016   1
1  McDonalds   Dallas  09/06/2011   2
2      Arbys   Austin  02/23/2012   1
3      Arbys   Austin  04/06/2013   2
4     Wendys  Chicago  10/23/2017   1
5     Wendys  Chicago  05/06/2019   2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM