简体   繁体   English

如何仅对 pandas dataframe 中的选定列进行排序

[英]How to sort only selected columns in pandas dataframe

I want to sort some columns in large Pandas dataframe.我想对大 Pandas dataframe 中的一些列进行排序。 Those columns are in the middle of df and in end.这些列位于 df 的中间并位于末尾。 They start with "R"他们以“R”开头

columns_list = df.columns.tolist()
columns_list
Out[17]: 
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']

I would like to re-order like that:我想像这样重新订购:

['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

Until now I did it manually:到目前为止,我是手动完成的:

df= df['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

but new input data have more R columns and in every file it is different.但是新的输入数据有更多的 R 列,并且在每个文件中都是不同的。

I would appreciate your advice.我会很感激你的建议。

This is surprisingly challenging.这是令人惊讶的挑战。 I can't find a one-liner, and the easiest I can find is:我找不到单线,我能找到的最简单的是:

# find the R columns
mask = df.columns.str.match('^R\d*$')

# numpy array
columns = df.columns.values

# sort the R parts
columns[mask] = sorted(columns[mask])

# assign back
df = df.reindex(columns, axis=1)
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5, 6), columns=['x','a','c','y','b','z'])

cols = list('xacybz')

df = DataFrame(randn(10, len(cols)), columns=cols)

preordered = list('xyz')

new_order = preordered + list(df.columns - preordered)

df.reindex(columns=new_order)

This should work, assuming the non "R" and "S" column names are not changing.这应该有效,假设非"R""S"列名没有改变。 If they are, I think you would have to do a regex thing to find the names of the columns you want to sort.如果是的话,我认为你必须做一个regex来找到你想要排序的列的名称。

I am sorting the names here by length and then alphabetically, which I think looks like how you are doing it.我在这里按长度对名称进行排序,然后按字母顺序对名称进行排序,我认为这看起来像你这样做的方式。

new_df_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3',
               'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
df = pd.DataFrame(columns=new_df_columns)

base_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier',]
extra_cols = [name for name in new_df_columns if name not in base_columns]
sorted_extra = sorted(extra_cols, key = lambda x: (len(x),x))

df = df[base_columns + sorted_extra]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM