[英]Pandas: create new column using condition df[variable].isnull on a list of variables
I want to create a mask column where 1 indicates there is data in a set of other columns and 0 when there is a blank in the same set 我想创建一个掩码列,其中1表示其他一组列中有数据,而0表示同一组中有空白
A B C D E mask1
0 13 2 45 96 1
1 14 2 45 96 1
2 15 9 1. NaN 1
3 16 9 1.0 NaN 1
4 17 5 0.0 NaN 1
5 18 6 1.0 967 1
6 19 6 1.0 976 1
7 20 9 1.0 294 1
8 21 5 0.0 372 1
9 13 5 NaN 170 0
10 62 5 NaN 100 0
11 22 20 NaN 170 0
12 13 NaN 0.0 996 0
I managed to do it using the following code: 我设法使用以下代码来做到这一点:
df2["mask1"] = np.where((df2['C'].isnull() | df2['D'].isnull()) , 0, 1)
Now I want to automate this for a larger dataframe with more variables, ie, I want to specify the variables I want to use for this mask. 现在,我想针对具有更多变量的较大数据框自动执行此操作,即,我想指定要用于此掩码的变量。 I was thinking to create a list of variables such as 我当时正在考虑创建一个变量列表,例如
var = [C, D, E]
which I could use to perform this operation, but am not sure how to apply the same code I came up with using this list. 我可以使用它来执行此操作,但不确定如何应用我使用此列表提供的相同代码。 for
loop? for
循环?
Select columns and apply isnull or notnull 选择列并应用isnull或notnull
cols = ['C', 'D', 'E']
df['mask1'] = df[cols].notnull().all(1).astype(int)
A B C D E mask1
0 0 13 2.0 45.0 96.0 1
1 1 14 2.0 45.0 96.0 1
2 2 15 9.0 1.0 NaN 0
3 3 16 9.0 1.0 NaN 0
4 4 17 5.0 0.0 NaN 0
5 5 18 6.0 1.0 967.0 1
6 6 19 6.0 1.0 976.0 1
7 7 20 9.0 1.0 294.0 1
8 8 21 5.0 0.0 372.0 1
9 9 13 5.0 NaN 170.0 0
10 10 62 5.0 NaN 100.0 0
11 11 22 20.0 NaN 170.0 0
12 12 13 NaN 0.0 996.0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.