简体   繁体   English

通过python pandas排序字符串数据框

[英]sorting string dataframe by python pandas

I have a column like in the output and want to get in ascending order as starting from General-0 and increasing. 我在输出中有一个列,希望从General-0开始按升序递增。 I tried the below it doesn't work.How can I get this done? 我在下面尝试了它不起作用。如何完成此工作? It's dtype shown as object. dtype显示为object。

dt.sort_values('run')

output 产量

        run               
717   General-25-20180121-15:27:27-3807  
824   General-26-20180121-15:27:28-3812  
931   General-27-20180121-15:27:29-3818  
1038  General-28-20180121-15:27:30-3823  
1145  General-29-20180121-15:27:30-3828  
1252  General-30-20180121-15:27:31-3833  
1359  General-31-20180121-15:27:31-3838  
1466  General-32-20180121-15:27:32-3843  
1573  General-33-20180121-15:27:33-3848  
1680  General-34-20180121-15:27:33-3855 
1787   General-0-20180121-15:27:08-3680 
1894   General-1-20180121-15:27:09-3685  
2001   General-2-20180121-15:27:10-3690  
2108   General-3-20180121-15:27:11-3695  
2215   General-4-20180121-15:27:11-3700  
2322   General-5-20180121-15:27:12-3706

The simpliest is if index values are not important use sorted with custom function: 最简单的是如果索引值不重要,请使用sorted自定义函数sorted

df['run'] = sorted(df['run'], key=lambda x: int(x.split('-')[1]))
print (df)
                                    run
717    General-0-20180121-15:27:08-3680
824    General-1-20180121-15:27:09-3685
931    General-2-20180121-15:27:10-3690
1038   General-3-20180121-15:27:11-3695
1145   General-4-20180121-15:27:11-3700
1252   General-5-20180121-15:27:12-3706
1359  General-25-20180121-15:27:27-3807
1466  General-26-20180121-15:27:28-3812
1573  General-27-20180121-15:27:29-3818
1680  General-28-20180121-15:27:30-3823
1787  General-29-20180121-15:27:30-3828
1894  General-30-20180121-15:27:31-3833
2001  General-31-20180121-15:27:31-3838
2108  General-32-20180121-15:27:32-3843
2215  General-33-20180121-15:27:33-3848
2322  General-34-20180121-15:27:33-3855

If index values are important first split , select second values by str[1] , cast to integers and for order use argsort with iloc : 如果索引值是重要的第一个split ,请通过str[1]选择第二个值,将其转换为整数,并使用带有iloc argsort进行iloc

df = df.iloc[df['run'].str.split('-').str[1].astype(int).argsort()]
print (df)
                                    run
1787   General-0-20180121-15:27:08-3680
1894   General-1-20180121-15:27:09-3685
2001   General-2-20180121-15:27:10-3690
2108   General-3-20180121-15:27:11-3695
2215   General-4-20180121-15:27:11-3700
2322   General-5-20180121-15:27:12-3706
717   General-25-20180121-15:27:27-3807
824   General-26-20180121-15:27:28-3812
931   General-27-20180121-15:27:29-3818
1038  General-28-20180121-15:27:30-3823
1145  General-29-20180121-15:27:30-3828
1252  General-30-20180121-15:27:31-3833
1359  General-31-20180121-15:27:31-3838
1466  General-32-20180121-15:27:32-3843
1573  General-33-20180121-15:27:33-3848
1680  General-34-20180121-15:27:33-3855

You can using split to build a helper key for your sort , then drop it after finished 您可以使用split为您的排序建立一个帮助键,然后在完成后将其drop

df.assign(helpkey=df.run.str.split('-',expand=True)[1].astype(int)).sort_values('helpkey').drop('helpkey',1)
Out[750]: 
                                    run
1787   General-0-20180121-15:27:08-3680
1894   General-1-20180121-15:27:09-3685
2001   General-2-20180121-15:27:10-3690
2108   General-3-20180121-15:27:11-3695
2215   General-4-20180121-15:27:11-3700
2322   General-5-20180121-15:27:12-3706
717   General-25-20180121-15:27:27-3807
824   General-26-20180121-15:27:28-3812
931   General-27-20180121-15:27:29-3818
1038  General-28-20180121-15:27:30-3823
1145  General-29-20180121-15:27:30-3828
1252  General-30-20180121-15:27:31-3833
1359  General-31-20180121-15:27:31-3838
1466  General-32-20180121-15:27:32-3843
1573  General-33-20180121-15:27:33-3848
1680  General-34-20180121-15:27:33-3855

You can use numpy.argsort together with pd.DataFrame.iloc . 您可以将numpy.argsortpd.DataFrame.iloc一起使用。

This method maintains index from the original dataframe. 此方法维护原始数据帧的索引。

res = df.iloc[np.argsort([int(i.split('-')[1]) for i in df['run']])]

print(res)

#                                     run
# 1787   General-0-20180121-15:27:08-3680
# 1894   General-1-20180121-15:27:09-3685
# 2001   General-2-20180121-15:27:10-3690
# 2108   General-3-20180121-15:27:11-3695
# 2215   General-4-20180121-15:27:11-3700
# 2322   General-5-20180121-15:27:12-3706
# 717   General-25-20180121-15:27:27-3807
# 824   General-26-20180121-15:27:28-3812
# 931   General-27-20180121-15:27:29-3818
# 1038  General-28-20180121-15:27:30-3823
# 1145  General-29-20180121-15:27:30-3828
# 1252  General-30-20180121-15:27:31-3833
# 1359  General-31-20180121-15:27:31-3838
# 1466  General-32-20180121-15:27:32-3843
# 1573  General-33-20180121-15:27:33-3848
# 1680  General-34-20180121-15:27:33-3855

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM