i have a dataframe like this
Datetime status time
2020-03-28 22:14:08 start 0
2020-03-29 00:28:50 end 02:13:52
2020-03-29 07:15:10 start 0
2020-03-29 07:48:02 end 00:32:47
how can i convert it to following
start end time
2020-03-28 22:14:08 2020-03-29 00:28:50 02:13:52
2020-03-29 07:15:10 2020-03-29 07:48:02 00:32:47
Idea is create new helper Series with compare start
with Series.cumsum
, added to MulitIndex
by DataFrame.set_index
, reshape by DataFrame.unstack
, remove not necessary column by DataFrame.drop
with tuple, because MultiIndex
and last in list comprehension create new columns names:
df = (df.set_index([df['status'].eq('start').cumsum(), 'status'])
.unstack()
.drop(('time','start'), axis=1))
df.columns = [y if x == 'Datetime' else x for x, y in df.columns]
print (df)
end start time
status
1 2020-03-29 00:28:50 2020-03-28 22:14:08 02:13:52
2 2020-03-29 07:48:02 2020-03-29 07:15:10 00:32:47
Another idea if always matchinf pairs start, end
is possible select even and odd values in columns by indexing in Series.iloc
, create default index by Series.reset_index
and join together by concat
:
s = df['Datetime'].iloc[::2].rename('start').reset_index(drop=True)
e = df['Datetime'].iloc[1::2].rename('end').reset_index(drop=True)
t = df['time'].iloc[1::2].reset_index(drop=True)
df = pd.concat([s, e, t], axis=1)
print (df)
start end time
0 2020-03-28 22:14:08 2020-03-29 00:28:50 02:13:52
1 2020-03-29 07:15:10 2020-03-29 07:48:02 00:32:47
Although @jezrael's answer is obviously awesome, here is a different way that you can try. It uses indexing.
import pandas as pd
a = pd.DataFrame({'Datetime': ['2020-03-28 22:14:08', '2020-03-29 00:28:50', '2020-03-29 07:15:10', '2020-03-29 07:48:02'], 'status': ['start', 'end', 'start', 'end'], 'time': ['0', '02:13:52', '0', '00:32:47']})
a.set_index('status',inplace=True)
c = pd.DataFrame(columns=['start','end','time'])
c['start'] = a.loc['start']['Datetime'].values
c['end'] = a.loc['end']['Datetime'].values
c['time'] = a.loc['end']['time'].values
print(c)
Output:
start end time
0 2020-03-28 22:14:08 2020-03-29 00:28:50 02:13:52
1 2020-03-29 07:15:10 2020-03-29 07:48:02 00:32:47
Here you go:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
Datetime,status,time
2020-03-28 22:14:08,start,0
2020-03-29 00:28:50,end,02:13:52
2020-03-29 07:15:10,start,0
2020-03-29 07:48:02,end,00:32:47"""))
df['start'] = df['Datetime'].shift()
df = df[df['status'] == 'end'][['start', 'Datetime', 'time']]
df = df.rename(columns={'Datetime': 'end'})
print(df)
Output:
start end time
1 2020-03-28 22:14:08 2020-03-29 00:28:50 02:13:52
3 2020-03-29 07:15:10 2020-03-29 07:48:02 00:32:47
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.