[英]How to sort only selected columns in pandas dataframe
I want to sort some columns in large Pandas dataframe.我想对大 Pandas dataframe 中的一些列进行排序。 Those columns are in the middle of df and in end.
这些列位于 df 的中间并位于末尾。 They start with "R"
他们以“R”开头
columns_list = df.columns.tolist()
columns_list
Out[17]:
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
I would like to re-order like that:我想像这样重新订购:
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']
Until now I did it manually:到目前为止,我是手动完成的:
df= df['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']
but new input data have more R columns and in every file it is different.但是新的输入数据有更多的 R 列,并且在每个文件中都是不同的。
I would appreciate your advice.我会很感激你的建议。
This is surprisingly challenging.这是令人惊讶的挑战。 I can't find a one-liner, and the easiest I can find is:
我找不到单线,我能找到的最简单的是:
# find the R columns
mask = df.columns.str.match('^R\d*$')
# numpy array
columns = df.columns.values
# sort the R parts
columns[mask] = sorted(columns[mask])
# assign back
df = df.reindex(columns, axis=1)
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5, 6), columns=['x','a','c','y','b','z'])
cols = list('xacybz')
df = DataFrame(randn(10, len(cols)), columns=cols)
preordered = list('xyz')
new_order = preordered + list(df.columns - preordered)
df.reindex(columns=new_order)
This should work, assuming the non "R"
and "S"
column names are not changing.这应该有效,假设非
"R"
和"S"
列名没有改变。 If they are, I think you would have to do a regex
thing to find the names of the columns you want to sort.如果是的话,我认为你必须做一个
regex
来找到你想要排序的列的名称。
I am sorting the names here by length and then alphabetically, which I think looks like how you are doing it.我在这里按长度对名称进行排序,然后按字母顺序对名称进行排序,我认为这看起来像你这样做的方式。
new_df_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3',
'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
df = pd.DataFrame(columns=new_df_columns)
base_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
'Planet', 'ExtraterrestialSupplier',]
extra_cols = [name for name in new_df_columns if name not in base_columns]
sorted_extra = sorted(extra_cols, key = lambda x: (len(x),x))
df = df[base_columns + sorted_extra]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.