[英]How to make previous and next columns in dataframe from existing dataframe?
So, let's say I have a data frame like this.所以,假设我有一个这样的数据框。
df = pd.DataFrame({'person':['A', 'A', 'B', 'B', 'A'],
'datetime':['2018-02-26 10:49:32', '2018-02-26 10:58:03', '2018-02-26 10:51:10','2018-02-26 10:58:45', '2018-02-26 10:43:34'],
'location':['a', 'b', 'c', 'd', 'e']})
That shows那说明
person datetime location
A 2018-02-26 10:49:32 a
A 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c
B 2018-02-26 10:58:45 d
A 2018-02-26 10:43:34 e
Then I sorted them based on each person and time然后我根据每个人和时间对它们进行排序
df.sort_values(by=['person', 'datetime'])
Which would sort the movement of each person then by their time.这将根据每个人的时间对每个人的运动进行排序。
person datetime location
4 A 2018-02-26 10:43:34 e
0 A 2018-02-26 10:49:32 a
1 A 2018-02-26 10:58:03 b
2 B 2018-02-26 10:51:10 c
3 B 2018-02-26 10:58:45 d
Which can be read as person A goes from place e, then goes to a, then goes to b.这可以理解为人 A 从地点 e,然后到 a,然后到 b。 Meanwhile person B goes from place c then to place d.
同时,人 B 从地点 c 到地点 d。
I want to create a dataframe which tracks each person's movement, like this.我想创建一个数据框来跟踪每个人的运动,就像这样。
| person | prev_datetime | prev_loc | next_datetime | next_loc |
|--------|---------------------|----------|---------------------|----------|
| A | 2018-02-26 10:43:34 | e | 2018-02-26 10:49:32 | a |
| A | 2018-02-26 10:49:32 | a | 2018-02-26 10:58:03 | b |
| B | 2018-02-26 10:51:10 | c | 2018-02-26 10:58:45 | d |
I haven't really had any idea how to do this.我真的不知道如何做到这一点。 Thanks.
谢谢。
Use DataFrameGroupBy.shift
by 2 columns, and last remove last duplicated rows by person
column by Series.duplicated
with rename
columns:使用
DataFrameGroupBy.shift
2 列,最后按person
列通过Series.duplicated
删除最后重复的行rename
列:
df['datetime'] = pd.to_datetime(df['datetime'])
df1 = df.sort_values(by=['person', 'datetime'])
df1[['next_datetime','next_loc']] = df1.groupby('person')['datetime','location'].shift(-1)
d = {'datetime':'prev_datetime','location':'prev_loc'}
df2 = df1[df1['person'].duplicated(keep='last')].rename(columns=d)
print (df2)
person prev_datetime prev_loc next_datetime next_loc
4 A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a
0 A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
2 B 2018-02-26 10:51:10 c 2018-02-26 10:58:45 d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.