[英]Check if column is in a list, remove if not and add value to a new column
I have a database like this:我有一个这样的数据库:
df = pd.DataFrame( {
"A" : [0,0,1,1,0,1] ,
"B" : [1,0,0,1,1,0],
"C" : [0,0,0,1,1,0],
"D" : [1,1,0,0,0,1]} )
which looks like this:看起来像这样:
A B C D
0 0 1 0 1
1 0 0 0 1
2 1 0 0 0
3 1 1 1 0
4 0 1 1 0
5 1 0 0 1
I have a list of columns I wish to keep allowed_columns = ["A","B"]
.我有一个我希望保留的列列表
allowed_columns = ["A","B"]
。 This means we get rid of C
and D
.这意味着我们摆脱了
C
和D
。 However, when dropping the columns, if there is a value 1, I want to note that in a new column lost
.但是,在删除列时,如果有值 1,我想注意在新列中
lost
。 This is what I'm trying to achieve:这就是我想要实现的目标:
A B lost
0 0 1 1
1 0 0 1
2 1 0 0
3 1 1 1
4 0 1 1
5 1 0 1
For ease of problem, we can assume that C
and D
cannot have value 1 simultaneously.为了解决问题,我们可以假设
C
和D
不能同时具有值 1。 How can I achieve this?我怎样才能做到这一点?
Subset to the allowed columns, then take the max of everything you removed with df.columns.difference
子集到允许的列,然后取你用
df.columns.difference
删除的所有内容的最大值
df = (df[allowed_columns]
.assign(lost=df[df.columns.difference(allowed_columns)].max(1)))
Let us do让我们做
df['Lost']=df[['C','D']].max(1)
df=df.drop(['C','D'],axis=1)
groupby
d = dict.fromkeys({*df} - {*allowed_columns}, 'lost')
df.groupby(lambda x: d.get(x, x), axis=1).max()
A B lost
0 0 1 1
1 0 0 1
2 1 0 0
3 1 1 1
4 0 1 1
5 1 0 1
df['lost']=((df['C']==1)|(df['D']==1)).astype(int)
df.drop(['C','D'],axis=1,inplace=True)
You can use two booleans separated by OR
to define the values in df['lost']
, I think it is also intuitive, because您可以使用由
OR
分隔的两个布尔值来定义df['lost']
中的值,我认为这也很直观,因为
(df['C']==1)|(df['D']==1)
will be True
if you have 1
in either column C or column D; (df['C']==1)|(df['D']==1)
如果在 C 列或 D 列中有1
,则为True
; otherwise it will be False
否则它将是
False
astype(int)
converts True
to 1
and False
to 0
astype(int)
将True
转换为1
,将False
转换为0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.