简体   繁体   English

熊猫用数字字符串对列进行排序

[英]Pandas sort column with numerical string

I have a DataFrame below:我在下面有一个数据框:

col1

Numb10
Numb11
Numb12
Numb7
Numb8

How can I sort with number order:如何按编号顺序排序:

col1

Numb7
Numb8
Numb10
Numb11
Numb12

I tried but got error TypeError: cannot convert the series to <class 'int'> .我试过但得到错误TypeError: cannot convert the series to <class 'int'>

df.sort_values(by = "col1", key = (lambda x: int(x[4:])))

Update with one missing in col1更新col1缺少一个

key in sort_values takes the Series as parameter instead of individual element. sort_values中的key将系列作为参数而不是单个元素。 From the docs:从文档:

Apply the key function to the values before sorting.在排序之前将键函数应用于值。 This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized .这类似于内置 sorted() 函数中的 key 参数,显着的区别是该 key 函数应该被向量化 It should expect a Series and return a Series with the same shape as the input.它应该期待一个系列并返回一个与输入具有相同形状的系列。 It will be applied to each column in by independently.它将被独立地应用于每一列。

In your case, you can use .str and astype for slicing and type convertion:在您的情况下,您可以使用.strastype进行切片和类型转换:

df.sort_values(by='col1', key=lambda s: s.str[4:].astype(int))
     col1
3   Numb7
4   Numb8
0  Numb10
1  Numb11
2  Numb12

Your x[4:] might not always be integers.您的x[4:]可能并不总是整数。 You can verify with你可以验证

# convert to numerical values, float, not integers
extracted_nums = pd.to_numeric(df['col1'].str[4:], errors='coerce')

# check for invalid values
# if not `0` means you have something that are not numerical
print(extracted_nums.isna().any())

# sort by values
df.loc[extracted_nums.sort_values().index]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM