[英]Pandas GroupBy and sort_values don't work as expected
I have dataset which has columns:我有包含列的数据集:
Unnamed: 0 int64
id object
number_from object
number_to object
time object
And I applied this function on it: data1.groupby('number_from').apply(lambda x: x.sort_values('time'))
我在它
data1.groupby('number_from').apply(lambda x: x.sort_values('time'))
应用了这个函数: data1.groupby('number_from').apply(lambda x: x.sort_values('time'))
I get something like this:我得到这样的东西:
Unnamed id number_from number_to time
17699 d20b3e 934 674 2017-07-03 06:36:20.000
17700 d20b81 934 674 2017-07-03 06:36:22.000
17701 d20b96 934 674 2017-07-03 06:36:23.000
**17703 d20c17 612 235 2017-07-03 06:36:28.000**
17707 d20db5 934 658 2017-07-03 06:36:45.000
17708 d20de9 934 658 2017-07-03 06:36:47.000
17710 d20e05 934 658 2017-07-03 06:36:49.000
17711 d20e41 934 658 2017-07-03 06:36:51.000
17712 d20e73 934 658 2017-07-03 06:36:53.000
17713 d20ecc 934 702 2017-07-03 06:36:57.000
17714 d20ef1 934 702 2017-07-03 06:36:59.000
17715 d20f32 934 702 2017-07-03 06:37:01.000
17716 d20f77 934 702 2017-07-03 06:37:03.000
17717 d20f8d 934 702 2017-07-03 06:37:05.000
17718 d20fd8 934 262 2017-07-03 06:37:08.000
17719 d21017 934 262 2017-07-03 06:37:11.000
17720 d21032 934 262 2017-07-03 06:37:12.000
17721 d2103e 934 262 2017-07-03 06:37:13.000
17722 d2106d 934 262 2017-07-03 06:37:15.000
**17723 d210c4 396 048 2017-07-03 06:37:19.000**
17725 d21147 934 691 2017-07-03 06:37:24.000
17726 d21167 934 691 2017-07-03 06:37:26.000
Note: this is just the subset of the values in the dataframe that I have.注意:这只是我拥有的数据帧中值的子集。 It just sorted values and it did not grouped.
它只是对值进行排序,而没有分组。
And I want to group and to sort time values in each group independently of others.而且我想独立于其他组对每个组中的时间值进行分组和排序。
I also tried data1.sort_values(['time'], ascending=False).groupby('number_from')
but the result is same.我也试过
data1.sort_values(['time'], ascending=False).groupby('number_from')
但结果是一样的。 What I'm doing wrong?我做错了什么?
I believe you need sorting by multiple columns together, but first convert column time
to datetimes:我相信您需要按多列一起排序,但首先将列
time
转换为日期time
:
data1['time'] = pd.to_datetime(data1['time'])
#if need to compare integers
data1['number_from'] = data1['number_from'].astype(int)
data1.sort_values(['number_from','time'])
EDIT:编辑:
Tested both solution, working same:测试了两种解决方案,工作相同:
data1['time'] = pd.to_datetime(data1['time'])
df1 = data1.sort_values(['number_from','time'])
print (df1)
id number_from number_to time
Unnamed
17723 d210c4 396 48 2017-07-03 06:37:19
17703 d20c17 612 235 2017-07-03 06:36:28
17699 d20b3e 934 674 2017-07-03 06:36:20
17700 d20b81 934 674 2017-07-03 06:36:22
17701 d20b96 934 674 2017-07-03 06:36:23
17707 d20db5 934 658 2017-07-03 06:36:45
17708 d20de9 934 658 2017-07-03 06:36:47
17710 d20e05 934 658 2017-07-03 06:36:49
17711 d20e41 934 658 2017-07-03 06:36:51
17712 d20e73 934 658 2017-07-03 06:36:53
17713 d20ecc 934 702 2017-07-03 06:36:57
17714 d20ef1 934 702 2017-07-03 06:36:59
17715 d20f32 934 702 2017-07-03 06:37:01
17716 d20f77 934 702 2017-07-03 06:37:03
17717 d20f8d 934 702 2017-07-03 06:37:05
17718 d20fd8 934 262 2017-07-03 06:37:08
17719 d21017 934 262 2017-07-03 06:37:11
17720 d21032 934 262 2017-07-03 06:37:12
17721 d2103e 934 262 2017-07-03 06:37:13
17722 d2106d 934 262 2017-07-03 06:37:15
17725 d21147 934 691 2017-07-03 06:37:24
17726 d21167 934 691 2017-07-03 06:37:26
Added parameter group_keys=False)
for avoid creating level from number_from
column:添加参数
group_keys=False)
以避免从number_from
列创建级别:
df2 = data1.groupby('number_from', group_keys=False).apply(lambda x: x.sort_values('time'))
print (df2)
id number_from number_to time
Unnamed
17723 d210c4 396 48 2017-07-03 06:37:19
17703 d20c17 612 235 2017-07-03 06:36:28
17699 d20b3e 934 674 2017-07-03 06:36:20
17700 d20b81 934 674 2017-07-03 06:36:22
17701 d20b96 934 674 2017-07-03 06:36:23
17707 d20db5 934 658 2017-07-03 06:36:45
17708 d20de9 934 658 2017-07-03 06:36:47
17710 d20e05 934 658 2017-07-03 06:36:49
17711 d20e41 934 658 2017-07-03 06:36:51
17712 d20e73 934 658 2017-07-03 06:36:53
17713 d20ecc 934 702 2017-07-03 06:36:57
17714 d20ef1 934 702 2017-07-03 06:36:59
17715 d20f32 934 702 2017-07-03 06:37:01
17716 d20f77 934 702 2017-07-03 06:37:03
17717 d20f8d 934 702 2017-07-03 06:37:05
17718 d20fd8 934 262 2017-07-03 06:37:08
17719 d21017 934 262 2017-07-03 06:37:11
17720 d21032 934 262 2017-07-03 06:37:12
17721 d2103e 934 262 2017-07-03 06:37:13
17722 d2106d 934 262 2017-07-03 06:37:15
17725 d21147 934 691 2017-07-03 06:37:24
17726 d21167 934 691 2017-07-03 06:37:26
print (df1.equals(df2))
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.