简体   繁体   English

Pandas GroupBy 和 sort_values 无法按预期工作

[英]Pandas GroupBy and sort_values don't work as expected

I have dataset which has columns:我有包含列的数据集:

Unnamed: 0         int64
id                object
number_from       object
number_to         object
time              object

And I applied this function on it: data1.groupby('number_from').apply(lambda x: x.sort_values('time'))我在它data1.groupby('number_from').apply(lambda x: x.sort_values('time'))应用了这个函数: data1.groupby('number_from').apply(lambda x: x.sort_values('time'))

I get something like this:我得到这样的东西:

Unnamed   id     number_from  number_to       time
17699  d20b3e          934           674  2017-07-03 06:36:20.000  
17700  d20b81          934           674  2017-07-03 06:36:22.000  
17701  d20b96          934           674  2017-07-03 06:36:23.000  
**17703  d20c17        612           235  2017-07-03 06:36:28.000**  
17707  d20db5          934           658  2017-07-03 06:36:45.000  
17708  d20de9          934           658  2017-07-03 06:36:47.000  
17710  d20e05          934           658  2017-07-03 06:36:49.000  
17711  d20e41          934           658  2017-07-03 06:36:51.000  
17712  d20e73          934           658  2017-07-03 06:36:53.000 
17713  d20ecc          934           702  2017-07-03 06:36:57.000  
17714  d20ef1          934           702  2017-07-03 06:36:59.000 
17715  d20f32          934           702  2017-07-03 06:37:01.000  
17716  d20f77          934           702  2017-07-03 06:37:03.000  
17717  d20f8d          934           702  2017-07-03 06:37:05.000 
17718  d20fd8          934           262  2017-07-03 06:37:08.000 
17719  d21017          934           262  2017-07-03 06:37:11.000  
17720  d21032          934           262  2017-07-03 06:37:12.000  
17721  d2103e          934           262  2017-07-03 06:37:13.000  
17722  d2106d          934           262  2017-07-03 06:37:15.000
**17723  d210c4        396           048  2017-07-03 06:37:19.000** 
17725  d21147          934           691  2017-07-03 06:37:24.000  
17726  d21167          934           691  2017-07-03 06:37:26.000 

Note: this is just the subset of the values in the dataframe that I have.注意:这只是我拥有的数据帧中值的子集。 It just sorted values and it did not grouped.它只是对值进行排序,而没有分组。

And I want to group and to sort time values in each group independently of others.而且我想独立于其他组对每个组中的时间值进行分组和排序。

I also tried data1.sort_values(['time'], ascending=False).groupby('number_from') but the result is same.我也试过data1.sort_values(['time'], ascending=False).groupby('number_from')但结果是一样的。 What I'm doing wrong?我做错了什么?

I believe you need sorting by multiple columns together, but first convert column time to datetimes:我相信您需要按多列一起排序,但首先将列time转换为日期time

data1['time'] = pd.to_datetime(data1['time'])
#if need to compare integers
data1['number_from'] = data1['number_from'].astype(int)

data1.sort_values(['number_from','time'])

EDIT:编辑:

Tested both solution, working same:测试了两种解决方案,工作相同:

data1['time'] = pd.to_datetime(data1['time'])

df1 = data1.sort_values(['number_from','time'])
print (df1)
             id  number_from  number_to                time
Unnamed                                                    
17723    d210c4          396         48 2017-07-03 06:37:19
17703    d20c17          612        235 2017-07-03 06:36:28
17699    d20b3e          934        674 2017-07-03 06:36:20
17700    d20b81          934        674 2017-07-03 06:36:22
17701    d20b96          934        674 2017-07-03 06:36:23
17707    d20db5          934        658 2017-07-03 06:36:45
17708    d20de9          934        658 2017-07-03 06:36:47
17710    d20e05          934        658 2017-07-03 06:36:49
17711    d20e41          934        658 2017-07-03 06:36:51
17712    d20e73          934        658 2017-07-03 06:36:53
17713    d20ecc          934        702 2017-07-03 06:36:57
17714    d20ef1          934        702 2017-07-03 06:36:59
17715    d20f32          934        702 2017-07-03 06:37:01
17716    d20f77          934        702 2017-07-03 06:37:03
17717    d20f8d          934        702 2017-07-03 06:37:05
17718    d20fd8          934        262 2017-07-03 06:37:08
17719    d21017          934        262 2017-07-03 06:37:11
17720    d21032          934        262 2017-07-03 06:37:12
17721    d2103e          934        262 2017-07-03 06:37:13
17722    d2106d          934        262 2017-07-03 06:37:15
17725    d21147          934        691 2017-07-03 06:37:24
17726    d21167          934        691 2017-07-03 06:37:26

Added parameter group_keys=False) for avoid creating level from number_from column:添加参数group_keys=False)以避免从number_from列创建级别:

df2 = data1.groupby('number_from', group_keys=False).apply(lambda x: x.sort_values('time'))
print (df2)
             id  number_from  number_to                time
Unnamed                                                    
17723    d210c4          396         48 2017-07-03 06:37:19
17703    d20c17          612        235 2017-07-03 06:36:28
17699    d20b3e          934        674 2017-07-03 06:36:20
17700    d20b81          934        674 2017-07-03 06:36:22
17701    d20b96          934        674 2017-07-03 06:36:23
17707    d20db5          934        658 2017-07-03 06:36:45
17708    d20de9          934        658 2017-07-03 06:36:47
17710    d20e05          934        658 2017-07-03 06:36:49
17711    d20e41          934        658 2017-07-03 06:36:51
17712    d20e73          934        658 2017-07-03 06:36:53
17713    d20ecc          934        702 2017-07-03 06:36:57
17714    d20ef1          934        702 2017-07-03 06:36:59
17715    d20f32          934        702 2017-07-03 06:37:01
17716    d20f77          934        702 2017-07-03 06:37:03
17717    d20f8d          934        702 2017-07-03 06:37:05
17718    d20fd8          934        262 2017-07-03 06:37:08
17719    d21017          934        262 2017-07-03 06:37:11
17720    d21032          934        262 2017-07-03 06:37:12
17721    d2103e          934        262 2017-07-03 06:37:13
17722    d2106d          934        262 2017-07-03 06:37:15
17725    d21147          934        691 2017-07-03 06:37:24
17726    d21167          934        691 2017-07-03 06:37:26

print (df1.equals(df2))
True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM