如何仅对 pandas dataframe 中的选定列进行排序

Question

I want to sort some columns in large Pandas dataframe.我想对大 Pandas dataframe 中的一些列进行排序。 Those columns are in the middle of df and in end.这些列位于 df 的中间并位于末尾。 They start with "R"他们以“R”开头

columns_list = df.columns.tolist()
columns_list
Out[17]: 
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']

I would like to re-order like that:我想像这样重新订购：

['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

Until now I did it manually:到目前为止，我是手动完成的：

df= df['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

but new input data have more R columns and in every file it is different.但是新的输入数据有更多的 R 列，并且在每个文件中都是不同的。

I would appreciate your advice.我会很感激你的建议。

Answer 1

This is surprisingly challenging.这是令人惊讶的挑战。 I can't find a one-liner, and the easiest I can find is:我找不到单线，我能找到的最简单的是：

# find the R columns
mask = df.columns.str.match('^R\d*$')

# numpy array
columns = df.columns.values

# sort the R parts
columns[mask] = sorted(columns[mask])

# assign back
df = df.reindex(columns, axis=1)

Answer 2

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5, 6), columns=['x','a','c','y','b','z'])

cols = list('xacybz')

df = DataFrame(randn(10, len(cols)), columns=cols)

preordered = list('xyz')

new_order = preordered + list(df.columns - preordered)

df.reindex(columns=new_order)

Answer 3

This should work, assuming the non "R" and "S" column names are not changing.这应该有效，假设非"R"和"S"列名没有改变。 If they are, I think you would have to do a regex thing to find the names of the columns you want to sort.如果是的话，我认为你必须做一个regex来找到你想要排序的列的名称。

I am sorting the names here by length and then alphabetically, which I think looks like how you are doing it.我在这里按长度对名称进行排序，然后按字母顺序对名称进行排序，我认为这看起来像你这样做的方式。

new_df_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3',
               'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
df = pd.DataFrame(columns=new_df_columns)

base_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier',]
extra_cols = [name for name in new_df_columns if name not in base_columns]
sorted_extra = sorted(extra_cols, key = lambda x: (len(x),x))

df = df[base_columns + sorted_extra]

如何仅对 pandas dataframe 中的选定列进行排序

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-06-02 20:53:40

解决方案2
0 2020-06-02 20:53:12

解决方案3
0 2020-06-02 21:03:02

如何仅对 pandas dataframe 中的选定列进行排序

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-06-02 20:53:40

解决方案2 0 2020-06-02 20:53:12

解决方案3 0 2020-06-02 21:03:02

解决方案1
1 已采纳 2020-06-02 20:53:40

解决方案2
0 2020-06-02 20:53:12

解决方案3
0 2020-06-02 21:03:02