[英]Pandas - Replace other columns in row with 0 if a specific column has a value of 1
Here is an example dataframe: 这是一个示例数据帧:
X Y Z
1 0 1
0 1 0
1 1 1
Now, here is the rule I've come up with: 现在,这是我提出的规则:
The final dataframe should look like this: 最终的数据框应如下所示:
X Y Z
0 0 1
0 1 0
0 0 1
My first thought at a solution is this: 我在解决方案上的第一个想法是:
df_null_list = ['X']
for i in ['Y', 'Z']:
df[df[i] == 1][df_null_list] = 0
df_null_list.append(i)
When I do this and sum across the y axis, i'm starting to get values of 2 and 4 which don't make sense. 当我这样做并在y轴上求和时,我开始得到2和4的值,这是没有意义的。 Note, i'm referring to when I ran this on the actual dataset.
注意,我指的是当我在实际数据集上运行它时。
Do you have any suggestions for improvements or alternative solutions? 您对改进或替代解决方案有什么建议吗?
df['X'] = df['X'].mask(df.Y == 1, 0)
df[['X', 'Y']] = df[['X', 'Y']].mask(df.Z == 1, 0)
Another solution with DataFrame.loc
: DataFrame.loc
另一个解决方案:
df.loc[df.Y == 1, 'X'] = 0
df.loc[df.Z == 1, ['X', 'Y']] = 0
print (df)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
You can generalize this to wanting the last index of 1
per row to remain 1
, and leave everything else as 0
. 您可以将此概括为希望每行
1
的最后一个索引保持为1
,并将其他所有内容保留为0
。 For performance operate on the underlying numpy
array: 对于底层
numpy
数组的性能操作:
a = df.values
idx = (a.shape[1] - a[:, ::-1].argmax(1)) - 1
t = np.zeros(a.shape)
t[np.arange(a.shape[0]), idx] = 1
array([[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.]])
If you need the result back as a DataFrame: 如果您需要将结果作为DataFrame返回:
pd.DataFrame(t, columns=df.columns, index=df.index).astype(int)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
Another solution could be to perform an expanding operation on the rows axis using numpy
: 另一种解决方案可能是使用
numpy
在行轴上执行扩展操作:
df1 = df.copy() == 1
df1.iloc[:,::-1].expanding(axis=1).apply(
lambda x: x[-1] * np.prod(np.logical_not(x[:-1]))
).iloc[:,::-1]
X Y Z
0 0.0 0.0 1.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.