[英]Sorting pandas dataframe vectorized way
The sample input dataframe is as follows 样本输入数据帧如下
df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,np.nan], [np.nan,np.nan,np.nan,np.nan]], columns=["A", "B","C","D","E"])
Expected Output 预期产量
df_output=pd.DataFrame([[-1,-1,0,1,1],[-1,-1,0,1,1],[-1,1,-1,1,0],[0,0,0,0,0]],columns=["A", "B","C","D","E"])
Here is what I am trying. 这是我正在尝试的。
1) Sort every row. 1)对每一行进行排序。
2)Assign -1 to half of smallest valid observations and +1 to largest valid observations. 2)将-1分配给最小的有效观测值,将+1分配给最大的有效观测值。
3) NaN needs to have zero. 3)NaN必须为零。
4) In case of odd number of columns, the median needs to have zero. 4)如果列数为奇数,则中位数需要为零。
The following code works well for even number of columns. 以下代码适用于偶数列。
df_input[:] = np.where(df_input.rank(axis=1) > df_input.shape[1] / 2, 1, -1)
How do I expand this to odd number of columns and account for NaN. 如何将其扩展为奇数列并说明NaN。 Thanks in advance. 提前致谢。
I believe you need numpy.select
: 我相信你需要numpy.select
:
a = df_input.rank(axis=1)
x = df_input.shape[1] / 2
m1 = a < x
m2 = a > x
m3 = (a.eq(a.mean(axis=1), axis=0))
df = pd.DataFrame(np.select([m3, m2, m1], [0, 1, -1], 0), columns=df_input.columns)
print (df)
A B C D E
0 -1 -1 0 1 1
1 -1 -1 0 1 1
2 -1 1 -1 1 0
3 0 0 0 0 0
It seems that you are using a string for the NAN ('NAN'). 看来您正在为NAN('NAN')使用字符串。
then: 然后:
df_input = pd.DataFrame([[1,2,3,4,5], [2,1,4,7,6], [5,6,3,7,np.nan], [np.nan,np.nan,np.nan,np.nan]], columns=["A", "B","C","D","E"])
df_input
A B C D E
0 1.0 2.0 3.0 4.0 5.0
1 2.0 1.0 4.0 7.0 6.0
2 5.0 6.0 3.0 7.0 NaN
3 NaN NaN NaN NaN NaN
df2 = df_input.copy()
df2[:] = np.where(df2.isna(), 0, np.where(df2.rank(axis=1) > df2.shape[1] / 2, 1, -1))
df2
A B C D E
0 -1 -1 1 1 1
1 -1 -1 1 1 1
2 -1 1 -1 1 0
3 0 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.