[英]`pandas.DataFrame.apply` in a row by row operation
I would like to return a dataFrame with each row sorted (let's say descending). 我想返回一个dataFrame,每行排序(让我们说降序)。 So if I have the pandas.DataFrame
named data
: 所以,如果我有pandas.DataFrame
命名data
:
In [38]: data
Out[38]:
c1 c2 c3 c4 c5 c6
Date
2012-10-22 0.973371 0.226342 0.968282 0.872330 0.273880 0.746156
2012-10-19 0.497048 0.351332 0.310025 0.726669 0.344202 0.878755
2012-10-18 0.315764 0.178584 0.838223 0.749962 0.850462 0.400253
2012-10-17 0.162879 0.068409 0.704094 0.712860 0.537545 0.009789
I would like the following returned: 我想回复以下内容:
In [39]: sorted_frame
Out[39]:
0 1 2 3 4 5
Date
2012-10-22 0.973371 0.968282 0.872332 0.746156 0.273880 0.226342
2012-10-19 0.878755 0.726669 0.497048 0.351332 0.344202 0.310025
2012-10-18 0.850462 0.838223 0.749962 0.400253 0.315764 0.178584
2012-10-17 0.712860 0.704094 0.537545 0.162879 0.068409 0.009789
I've tried DataFrame.sort(axis = 1)
however, that doesn't achieve the desired result: 我已经尝试过DataFrame.sort(axis = 1)
但是,它没有达到预期的结果:
In [40]: data.sort(axis = 1)
Out[43]:
c1 c2 c3 c4 c5 c6
Date
2012-10-22 0.973371 0.226342 0.968282 0.872330 0.273880 0.746156
2012-10-19 0.497048 0.351332 0.310025 0.726669 0.344202 0.878755
2012-10-18 0.315764 0.178584 0.838223 0.749962 0.850462 0.400253
2012-10-17 0.162879 0.068409 0.704094 0.712860 0.537545 0.009789
I've created the following function that accomplishes what I'm looking for (using the pandas.TimeSeries.order()
): 我创建了以下函数来完成我正在寻找的东西(使用pandas.TimeSeries.order()
):
import numpy
def sorted_by_row(frame, ascending = False):
vals = numpy.tile(numpy.nan,frame.shape)
for row in numpy.arange(frame.shape[0]):
vals[row, :] = frame.ix[row, :].order(ascending = ascending)
return pandas.DataFrame(vals, index = frame.index)
However, my goal is to be able to use a row-wise function in the DataFrame.apply()
method (so I can apply the desired functionality to other functions I build). 但是,我的目标是能够在DataFrame.apply()
方法中使用行方式功能(因此我可以将所需的功能应用于我构建的其他功能)。 I've tried: 我试过了:
#TimeSeries.order() sorts a pandas.TimeSeries object
data.apply(lambda x: x.order(), axis = 1)
But again, I'm not getting the desired DataFrame
above (I've outputted enough DataFrame'
s so I'll spare the page the real estate). 但同样,我没有得到上面所需的DataFrame
(我已经输出了足够的DataFrame'
所以我会把页面DataFrame'
房地产)。
Your help is greatly appreciated, 非常感谢您的帮助,
-B -B
Well, it's not too easy to do with pandas out of the box. 嗯,开箱即用的熊猫并不容易。 First, familiarize yourself with argsort
: 首先,熟悉argsort
:
In [8]: df
Out[8]:
0 1 2 3 4
2012-10-17 1.542735 1.081290 2.602967 0.748706 0.682501
2012-10-18 0.058414 0.148083 0.094104 0.716789 2.482998
2012-10-19 2.396277 0.524733 2.169018 1.365622 0.590767
2012-10-20 0.513535 1.542485 0.186261 2.138740 1.173894
2012-10-21 0.495713 1.401872 0.919931 0.055136 1.358439
2012-10-22 1.010086 0.350249 1.116935 0.323305 0.506086
In [12]: inds = df.values.argsort(1)
In [13]: inds
Out[13]:
array([[4, 3, 1, 0, 2],
[0, 2, 1, 3, 4],
[1, 4, 3, 2, 0],
[2, 0, 4, 1, 3],
[3, 0, 2, 4, 1],
[3, 1, 4, 0, 2]])
These are the indirect sort indices for each row. 这些是每行的间接排序索引。 Now you want to do something like: 现在你要做的事情如下:
new_values = np.empty_like(df)
for i, row in enumerate(df.values):
# sort in descending order
new_values[i] = row[inds[i]][::-1]
sorted_df = DataFrame(new_values, index=df.index)
Not that satisfying but it gets the job done: 不是那么令人满意,但它完成了工作:
In [15]: sorted_df
Out[15]:
0 1 2 3 4
2012-10-17 2.602967 1.542735 1.081290 0.748706 0.682501
2012-10-18 2.482998 0.716789 0.148083 0.094104 0.058414
2012-10-19 2.396277 2.169018 1.365622 0.590767 0.524733
2012-10-20 2.138740 1.542485 1.173894 0.513535 0.186261
2012-10-21 1.401872 1.358439 0.919931 0.495713 0.055136
2012-10-22 1.116935 1.010086 0.506086 0.350249 0.323305
More generally you can do: 更一般地说,你可以做:
In [23]: df.apply(lambda x: np.sort(x.values)[::-1], axis=1)
Out[23]:
0 1 2 3 4
2012-10-17 2.602967 1.542735 1.081290 0.748706 0.682501
2012-10-18 2.482998 0.716789 0.148083 0.094104 0.058414
2012-10-19 2.396277 2.169018 1.365622 0.590767 0.524733
2012-10-20 2.138740 1.542485 1.173894 0.513535 0.186261
2012-10-21 1.401872 1.358439 0.919931 0.495713 0.055136
2012-10-22 1.116935 1.010086 0.506086 0.350249 0.323305
But you're responsible for assigning new columns yourself 但是您自己负责分配新列
Sorting is a big subject, and I'm sure there are many ways to do this. 排序是一个很大的主题,我相信有很多方法可以做到这一点。 Here's one. 这是一个。
First creating an example dataframe. 首先创建一个示例数据帧。
In [31]: rndrange = pd.DatetimeIndex(start='10/17/2012', end='10/22/2012', freq='D')
In [32]: df = pd.DataFrame(np.random.randn(len(rndrange),5),index=rndrange)
In [33]: df = df.applymap(abs) #Easier to see sorting if all vals are positive
In [34]: df
Out[34]:
0 1 2 3 4
2012-10-17 1.542735 1.081290 2.602967 0.748706 0.682501
2012-10-18 0.058414 0.148083 0.094104 0.716789 2.482998
2012-10-19 2.396277 0.524733 2.169018 1.365622 0.590767
2012-10-20 0.513535 1.542485 0.186261 2.138740 1.173894
2012-10-21 0.495713 1.401872 0.919931 0.055136 1.358439
2012-10-22 1.010086 0.350249 1.116935 0.323305 0.506086
Sorting: 排序:
In [35]: df.as_matrix().sort(1)
In [36]: df
Out[36]:
0 1 2 3 4
2012-10-17 0.682501 0.748706 1.081290 1.542735 2.602967
2012-10-18 0.058414 0.094104 0.148083 0.716789 2.482998
2012-10-19 0.524733 0.590767 1.365622 2.169018 2.396277
2012-10-20 0.186261 0.513535 1.173894 1.542485 2.138740
2012-10-21 0.055136 0.495713 0.919931 1.358439 1.401872
2012-10-22 0.323305 0.350249 0.506086 1.010086 1.116935
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.