检查列值是否在pandas的其他列中

Question

我在熊猫中有以下数据帧

  target   A       B      C
0 cat      bridge  cat    brush  
1 brush    dog     cat    shoe
2 bridge   cat     shoe   bridge

如何测试df.target是否在任何列['A','B','C', etc.] ，哪些列要检查？

我尝试将A，B和C合并为字符串以使用df.abcstring.str.contains(df.target)但这不起作用。

Answer 1

你可以使用drop ， isin和any 。

drop target列，使其仅包含A ， B ， C列的df
检查值isin目标列
并检查是否有any命中

而已。

df["exists"] = df.drop("target", 1).isin(df["target"]).any(1)
print(df)

    target  A       B       C       exists
0   cat     bridge  cat     brush   True
1   brush   dog     cat     shoe    False
2   bridge  cat     shoe    bridge  True

Answer 2

OneHotEncoder方法：

In [165]: x = pd.get_dummies(df.drop('target',1), prefix='', prefix_sep='')

In [166]: x
Out[166]:
   bridge  cat  dog  cat  shoe  bridge  brush  shoe
0       1    0    0    1     0       0      1     0
1       0    0    1    1     0       0      0     1
2       0    1    0    0     1       1      0     0

In [167]: x[df['target']].eq(1).any(1)
Out[167]:
0    True
1    True
2    True
dtype: bool

说明：

In [168]: x[df['target']]
Out[168]:
   cat  cat  brush  bridge  bridge
0    0    1      1       1       0
1    0    1      0       0       0
2    1    0      0       0       1

Answer 3

如果neech按行检查，你可以使用eq ，drop drop pop ：

mask = df.eq(df.pop('target'), axis=0)
print (mask)
       A      B      C
0  False   True  False
1  False  False  False
2  False  False   True

然后如果需要检查至少一个True添加any ：

mask = df.eq(df.pop('target'), axis=0).any(axis=1)
print (mask)
0     True
1    False
2     True
dtype: bool

df['new'] = df.eq(df.pop('target'), axis=0).any(axis=1)
print (df)
        A     B       C    new
0  bridge   cat   brush   True
1     dog   cat    shoe  False
2     cat  shoe  bridge   True

但如果需要检查列使用中的所有值isin ：

mask = df.isin(df.pop('target').values.tolist())
print (mask)
       A      B      C
0   True   True   True
1  False   True  False
2   True  False   True

如果想检查所有值是否为True添加all ：

df['new'] = df.isin(df.pop('target').values.tolist()).all(axis=1)
print (df)
        A     B       C    new
0  bridge   cat   brush   True
1     dog   cat    shoe  False
2     cat  shoe  bridge  False

Answer 4

您可以使用为每行计算一个函数，该函数计算与“目标”列中的值匹配的值的数量：

df["exist"] = df.apply(lambda row:row.value_counts()[row['target']] > 1 , axis=1)

对于看起来像这样的数据框：

   b  c target
0  3  a      a
1  3  4      2
2  3  4      2
3  3  4      2
4  3  4      4

输出将是：

   b  c target  exist
0  3  a      a   True
1  3  4      2  False
2  3  4      2  False
3  3  4      2  False
4  3  4      4   True

Answer 5

另一种使用索引差异法的方法：

matches = df[df.columns.difference(['target'])].eq(df['target'], axis = 0)

#       A      B      C
#0  False   True  False
#1  False  False  False
#2  False  False   True

# Check if at least one match:
matches.any(axis = 1)

#Out[30]: 
#0     True
#1    False
#2     True

如果您想查看哪些列符合目标，这是一个可能的解决方案：

matches.apply(lambda x: ", ".join(x.index[np.where(x.tolist())]), axis = 1)

Out[53]: 
0    B
1     
2    C
dtype: object

检查列值是否在pandas的其他列中

问题描述

5 个解决方案

解决方案1
9 2017-03-29 12:33:32

解决方案2
3 2017-03-29 12:42:12

解决方案3
2 2017-03-29 12:24:38

解决方案4
1 2017-03-29 12:31:26

解决方案5
1 2017-03-29 12:47:33

检查列值是否在pandas的其他列中

问题描述

5 个解决方案

解决方案1 9 2017-03-29 12:33:32

解决方案2 3 2017-03-29 12:42:12

解决方案3 2 2017-03-29 12:24:38

解决方案4 1 2017-03-29 12:31:26

解决方案5 1 2017-03-29 12:47:33

解决方案1
9 2017-03-29 12:33:32

解决方案2
3 2017-03-29 12:42:12

解决方案3
2 2017-03-29 12:24:38

解决方案4
1 2017-03-29 12:31:26

解决方案5
1 2017-03-29 12:47:33