简体   繁体   English

排序熊猫数据框矢量化方式

[英]Sorting pandas dataframe vectorized way

The sample input dataframe is as follows 样本输入数据帧如下

df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,np.nan], [np.nan,np.nan,np.nan,np.nan]], columns=["A", "B","C","D","E"])

Expected Output 预期产量

df_output=pd.DataFrame([[-1,-1,0,1,1],[-1,-1,0,1,1],[-1,1,-1,1,0],[0,0,0,0,0]],columns=["A", "B","C","D","E"])

Here is what I am trying. 这是我正在尝试的。
1) Sort every row. 1)对每一行进行排序。

2)Assign -1 to half of smallest valid observations and +1 to largest valid observations. 2)将-1分配给最小的有效观测值,将+1分配给最大的有效观测值。

3) NaN needs to have zero. 3)NaN必须为零。

4) In case of odd number of columns, the median needs to have zero. 4)如果列数为奇数,则中位数需要为零。

The following code works well for even number of columns. 以下代码适用于偶数列。

df_input[:] = np.where(df_input.rank(axis=1) > df_input.shape[1] / 2, 1, -1)

How do I expand this to odd number of columns and account for NaN. 如何将其扩展为奇数列并说明NaN。 Thanks in advance. 提前致谢。

I believe you need numpy.select : 我相信你需要numpy.select

a = df_input.rank(axis=1)
x = df_input.shape[1] / 2

m1 = a < x
m2 = a > x
m3 = (a.eq(a.mean(axis=1), axis=0))

df = pd.DataFrame(np.select([m3, m2, m1], [0, 1, -1], 0), columns=df_input.columns)
print (df)
   A  B  C  D  E
0 -1 -1  0  1  1
1 -1 -1  0  1  1
2 -1  1 -1  1  0
3  0  0  0  0  0

It seems that you are using a string for the NAN ('NAN'). 看来您正在为NAN('NAN')使用字符串。

then: 然后:

df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,np.nan], [np.nan,np.nan,np.nan,np.nan]], columns=["A", "B","C","D","E"])

    df_input
        A   B   C   D   E
0   1.0 2.0 3.0 4.0 5.0
1   2.0 1.0 4.0 7.0 6.0
2   5.0 6.0 3.0 7.0 NaN
3   NaN NaN NaN NaN NaN

df2 = df_input.copy()
df2[:] = np.where(df2.isna(), 0, np.where(df2.rank(axis=1) > df2.shape[1] / 2, 1, -1))

df2

        A   B   C   D   E
0   -1  -1  1   1   1
1   -1  -1  1   1   1
2   -1  1   -1  1   0
3   0   0   0   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM