简体   繁体   English

如何对熊猫数据框的每一行进行排序并根据行的排序值返回列索引

[英]How to sort each row of pandas dataframe and return column index based on sorted values of row

I am trying to sort each row of pandas dataframe and get the index of sorted values in a new dataframe. 我正在尝试对熊猫数据框的每一行进行排序,并获取新数据框中排序值的索引。 I could do it in a slow way. 我可以用很慢的方式做。 Can anyone suggest improvements using parallelization or vectorized code for this. 谁能为此建议使用并行化或矢量化代码进行改进。 I have posted an example below. 我在下面发布了一个示例。

data_url = ' https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv ' data_url =' https: //raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv'

# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)

# drop categorical column
gapminder.drop(['country', 'continent'], axis=1, inplace=True) 

# print the first three rows
print(gapminder.head(n=3))

   year         pop  lifeExp   gdpPercap
0  1952   8425333.0   28.801  779.445314
1  1957   9240934.0   30.332  820.853030
2  1962  10267083.0   31.997  853.100710

The result I am looking for is this 我正在寻找的结果是这个

tag_0   tag_1   tag_2   tag_3
0   pop year    gdpPercap   lifeExp
1   pop year    gdpPercap   lifeExp
2   pop year    gdpPercap   lifeExp

In this case, since pop is always higher than gdpPercap and lifeExp , it always comes first. 在这种情况下,由于pop始终高于gdpPercaplifeExp ,因此它始终排在第一位。

I could achieve the required output by using the following code. 通过使用以下代码,我可以实现所需的输出。 But the computation takes longer time if the df has lot of rows/columns. 但是,如果df有很多行/列,则计算会花费更长的时间。

Can anyone suggest an improvement over this 谁能建议对此进行改进

def sort_df(df):
    sorted_tags = pd.DataFrame(index = df.index, columns = ['tag_{}'.format(i) for i in range(df.shape[1])])
    for i in range(df.shape[0]):
        sorted_tags.iloc[i,:] = list( df.iloc[i, :].sort_values(ascending=False).index)
    return sorted_tags

sort_df(gapminder)

This is probably as fast as it gets with numpy: 这可能和numpy一样快:

def sort_df(df):
    return pd.DataFrame(
        data=df.columns.values[np.argsort(-df.values, axis=1)],
        columns=['tag_{}'.format(i) for i in range(df.shape[1])]
    )

print(sort_df(gapminder.head(3)))

  tag_0 tag_1      tag_2    tag_3
0   pop  year  gdpPercap  lifeExp
1   pop  year  gdpPercap  lifeExp
2   pop  year  gdpPercap  lifeExp

Explanation: np.argsort sorts the values along rows, but returns the indices that sort the array instead of sorted values, which can be used for co-sorting arrays. 说明: np.argsort沿行对值进行排序,但返回对数组进行排序的索引,而不是对数组进行排序的索引。 The minus sorts in descending order. 减号按降序排列。 In your case, you use the indices to sort the columns. 在您的情况下,您可以使用索引对列进行排序。 numpy broadcasting takes care of returning the correct shape. numpy广播负责返回正确的形状。

Runtime is around 3ms for your example vs 2.5s with your function. 对于您的示例,运行时间约为3毫秒,而函数运行时约为2.5毫秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 返回使用熊猫排序的行的索引值? - Return index values of a sorted row using pandas? 如何根据多索引熊猫数据框中的行索引值创建列? - How to create a column depending on the row index values in a multiindex pandas dataframe? 如何在 pandas dataframe 中查找每一行的顶列值 - How to find the top column values of each row in a pandas dataframe 如何根据行对 pandas dataframe 进行排序? - How to sort pandas dataframe based on row? 根据 pandas 中每个排序组的第一行创建一列 - Create a column based on first row of each sorted group in pandas 如何根据列值的长度从pandas数据帧中删除一行? - How to remove a row from pandas dataframe based on the length of the column values? Pandas:如何使用其他 dataframe 的列值从 dataframe 返回具有相同行值的行? - Pandas: How to return the row from dataframe having same row values by using column value of other dataframe? Python Pandas DataFrame:根据条件替换每一列的每一行中的值 - Python Pandas DataFrame: Replace values in each row for each column based on conditions 如何打印pandas dataframe每一行的索引值、列名和列数据? - How to print index value, column name, and column data for each row of pandas dataframe? 熊猫数据框返回列标题链接到每一行的数据值 - Pandas dataframe return column header linked to data value for each row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM