简体   繁体   中英

How to pivot columns to titles? - python pandas dataframe

i have a dataframe like this

Datetime               status   time
2020-03-28 22:14:08     start   0
2020-03-29 00:28:50     end     02:13:52
2020-03-29 07:15:10     start   0
2020-03-29 07:48:02     end     00:32:47

how can i convert it to following

start                    end                    time 
2020-03-28 22:14:08      2020-03-29 00:28:50   02:13:52
2020-03-29 07:15:10      2020-03-29 07:48:02   00:32:47

Idea is create new helper Series with compare start with Series.cumsum , added to MulitIndex by DataFrame.set_index , reshape by DataFrame.unstack , remove not necessary column by DataFrame.drop with tuple, because MultiIndex and last in list comprehension create new columns names:

df = (df.set_index([df['status'].eq('start').cumsum(), 'status'])
       .unstack()
       .drop(('time','start'), axis=1))

df.columns = [y if x == 'Datetime' else x for x, y in df.columns]
print (df)
                        end                start      time
status                                                    
1       2020-03-29 00:28:50  2020-03-28 22:14:08  02:13:52
2       2020-03-29 07:48:02  2020-03-29 07:15:10  00:32:47

Another idea if always matchinf pairs start, end is possible select even and odd values in columns by indexing in Series.iloc , create default index by Series.reset_index and join together by concat :

s = df['Datetime'].iloc[::2].rename('start').reset_index(drop=True)
e = df['Datetime'].iloc[1::2].rename('end').reset_index(drop=True)
t = df['time'].iloc[1::2].reset_index(drop=True)

df = pd.concat([s, e, t], axis=1)
print (df)
                 start                  end      time
0  2020-03-28 22:14:08  2020-03-29 00:28:50  02:13:52
1  2020-03-29 07:15:10  2020-03-29 07:48:02  00:32:47

Although @jezrael's answer is obviously awesome, here is a different way that you can try. It uses indexing.

import pandas as pd
a  = pd.DataFrame({'Datetime': ['2020-03-28 22:14:08', '2020-03-29 00:28:50', '2020-03-29 07:15:10', '2020-03-29 07:48:02'], 'status': ['start', 'end', 'start', 'end'], 'time': ['0', '02:13:52', '0', '00:32:47']})

a.set_index('status',inplace=True)

c = pd.DataFrame(columns=['start','end','time'])
c['start']  = a.loc['start']['Datetime'].values
c['end']  = a.loc['end']['Datetime'].values
c['time'] = a.loc['end']['time'].values
print(c)

Output:

                 start                  end      time
0  2020-03-28 22:14:08  2020-03-29 00:28:50  02:13:52
1  2020-03-29 07:15:10  2020-03-29 07:48:02  00:32:47

Here you go:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
Datetime,status,time
2020-03-28 22:14:08,start,0
2020-03-29 00:28:50,end,02:13:52
2020-03-29 07:15:10,start,0
2020-03-29 07:48:02,end,00:32:47"""))
df['start'] = df['Datetime'].shift()
df = df[df['status'] == 'end'][['start', 'Datetime', 'time']]
df = df.rename(columns={'Datetime': 'end'})
print(df)

Output:

                 start                  end      time
1  2020-03-28 22:14:08  2020-03-29 00:28:50  02:13:52
3  2020-03-29 07:15:10  2020-03-29 07:48:02  00:32:47

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM