[英]sorting string dataframe by python pandas
我在输出中有一个列,希望从General-0开始按升序递增。 我在下面尝试了它不起作用。如何完成此工作? dtype显示为object。
dt.sort_values('run')
产量
run
717 General-25-20180121-15:27:27-3807
824 General-26-20180121-15:27:28-3812
931 General-27-20180121-15:27:29-3818
1038 General-28-20180121-15:27:30-3823
1145 General-29-20180121-15:27:30-3828
1252 General-30-20180121-15:27:31-3833
1359 General-31-20180121-15:27:31-3838
1466 General-32-20180121-15:27:32-3843
1573 General-33-20180121-15:27:33-3848
1680 General-34-20180121-15:27:33-3855
1787 General-0-20180121-15:27:08-3680
1894 General-1-20180121-15:27:09-3685
2001 General-2-20180121-15:27:10-3690
2108 General-3-20180121-15:27:11-3695
2215 General-4-20180121-15:27:11-3700
2322 General-5-20180121-15:27:12-3706
最简单的是如果索引值不重要,请使用sorted
自定义函数sorted
:
df['run'] = sorted(df['run'], key=lambda x: int(x.split('-')[1]))
print (df)
run
717 General-0-20180121-15:27:08-3680
824 General-1-20180121-15:27:09-3685
931 General-2-20180121-15:27:10-3690
1038 General-3-20180121-15:27:11-3695
1145 General-4-20180121-15:27:11-3700
1252 General-5-20180121-15:27:12-3706
1359 General-25-20180121-15:27:27-3807
1466 General-26-20180121-15:27:28-3812
1573 General-27-20180121-15:27:29-3818
1680 General-28-20180121-15:27:30-3823
1787 General-29-20180121-15:27:30-3828
1894 General-30-20180121-15:27:31-3833
2001 General-31-20180121-15:27:31-3838
2108 General-32-20180121-15:27:32-3843
2215 General-33-20180121-15:27:33-3848
2322 General-34-20180121-15:27:33-3855
如果索引值是重要的第一个split
,请通过str[1]
选择第二个值,将其转换为整数,并使用带有iloc
argsort
进行iloc
:
df = df.iloc[df['run'].str.split('-').str[1].astype(int).argsort()]
print (df)
run
1787 General-0-20180121-15:27:08-3680
1894 General-1-20180121-15:27:09-3685
2001 General-2-20180121-15:27:10-3690
2108 General-3-20180121-15:27:11-3695
2215 General-4-20180121-15:27:11-3700
2322 General-5-20180121-15:27:12-3706
717 General-25-20180121-15:27:27-3807
824 General-26-20180121-15:27:28-3812
931 General-27-20180121-15:27:29-3818
1038 General-28-20180121-15:27:30-3823
1145 General-29-20180121-15:27:30-3828
1252 General-30-20180121-15:27:31-3833
1359 General-31-20180121-15:27:31-3838
1466 General-32-20180121-15:27:32-3843
1573 General-33-20180121-15:27:33-3848
1680 General-34-20180121-15:27:33-3855
您可以使用split
为您的排序建立一个帮助键,然后在完成后将其drop
df.assign(helpkey=df.run.str.split('-',expand=True)[1].astype(int)).sort_values('helpkey').drop('helpkey',1)
Out[750]:
run
1787 General-0-20180121-15:27:08-3680
1894 General-1-20180121-15:27:09-3685
2001 General-2-20180121-15:27:10-3690
2108 General-3-20180121-15:27:11-3695
2215 General-4-20180121-15:27:11-3700
2322 General-5-20180121-15:27:12-3706
717 General-25-20180121-15:27:27-3807
824 General-26-20180121-15:27:28-3812
931 General-27-20180121-15:27:29-3818
1038 General-28-20180121-15:27:30-3823
1145 General-29-20180121-15:27:30-3828
1252 General-30-20180121-15:27:31-3833
1359 General-31-20180121-15:27:31-3838
1466 General-32-20180121-15:27:32-3843
1573 General-33-20180121-15:27:33-3848
1680 General-34-20180121-15:27:33-3855
您可以将numpy.argsort
与pd.DataFrame.iloc
一起使用。
此方法维护原始数据帧的索引。
res = df.iloc[np.argsort([int(i.split('-')[1]) for i in df['run']])]
print(res)
# run
# 1787 General-0-20180121-15:27:08-3680
# 1894 General-1-20180121-15:27:09-3685
# 2001 General-2-20180121-15:27:10-3690
# 2108 General-3-20180121-15:27:11-3695
# 2215 General-4-20180121-15:27:11-3700
# 2322 General-5-20180121-15:27:12-3706
# 717 General-25-20180121-15:27:27-3807
# 824 General-26-20180121-15:27:28-3812
# 931 General-27-20180121-15:27:29-3818
# 1038 General-28-20180121-15:27:30-3823
# 1145 General-29-20180121-15:27:30-3828
# 1252 General-30-20180121-15:27:31-3833
# 1359 General-31-20180121-15:27:31-3838
# 1466 General-32-20180121-15:27:32-3843
# 1573 General-33-20180121-15:27:33-3848
# 1680 General-34-20180121-15:27:33-3855
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.