简体   繁体   English

sort_values() 与 'key' 对 dataframe 中的一列元组进行排序

[英]sort_values() with 'key' to sort a column of tuples in a dataframe

I have the following df:我有以下df:

df = pd.DataFrame({'Params': {0: (400, 30), 1: (2000, 10), 2: (1200, 10), 3: (2000, 30), 4: (1600, None)}, 'mean_test_score': {0: -0.6197478578718253, 1: -0.6164605619489576, 2: -0.6229674626212879, 3: -0.7963084775995496, 4: -0.7854265341671137}})

I wish to sort it according to the first element of the tuples in the first column.我希望根据第一列中元组的第一个元素对其进行排序。

First column of the desired output: {'Params': {0: (400, 30), 2: (1200, 10), 4: (1600, 10), 1: (2000, 10), 3: (2000, 30),所需 output 的第一列:{'Params': {0: (400, 30), 2: (1200, 10), 4: (1600, 10), 1: (2000, 10), 3: (2000, 30),

I have tried to use df.sort_values(by=('Params'), key=lambda x:x[0]) like I would do with a list and.sort but I get the following value error: ValueError: User-provided key function must not change the shape of the array.我尝试使用df.sort_values(by=('Params'), key=lambda x:x[0])就像我对列表和.sort 所做的那样,但我得到以下值错误: ValueError: User-providedfunction must not change the shape of the array.

I have looked at the documentation of sort_values() but it did not help much about why lambda does not work.我查看了sort_values()的文档,但它对为什么 lambda 不起作用没有太大帮助。

EDIT: Following @DeepSpace input, the use of df.sort_values(by='Params') gives '<' not supported between instances of 'NoneType' and 'int' due to the presence of None in the second part of the tuples编辑:在@DeepSpace 输入之后,使用df.sort_values(by='Params') '<' not supported between instances of 'NoneType' and 'int'因为在元组的第二部分中存在None

Anyone could help?任何人都可以帮忙吗?

The document of sort_values() says sort_values()的文档说

key should expect a Series and return a Series with the same shape as the input. key应该期望一个Series并返回一个与输入具有相同形状的系列。

In df.sort_values(by=('Params'), key=lambda x:x[0]) , the x is actually the Params column.df.sort_values(by=('Params'), key=lambda x:x[0])中, x实际上是Params列。 By accessing x with x[0] , you are returning the first element of x Series, which is not the same shape as input Series.通过使用x[0]访问x ,您将返回x系列的第一个元素,它与输入系列的形状不同。 Thus gives you the error.因此给你错误。

If you want to sort by the first element of tuple, you can do如果你想按元组的第一个元素排序,你可以这样做

df.sort_values(by='Params', key=lambda col: col.map(lambda x: x[0]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM