python pandas列以其他两个列值为条件

Question

Is there a way in python pandas to apply a conditional if one or another column have a value? 如果一个或另一个列有值，python pandas中是否有一种方法可以应用条件？

For one column, I know I can use the following code, to apply a test flag if the column Title includes the word "test". 对于一列，我知道我可以使用以下代码，如果列标题包含单词“test”，则应用测试标志。

df['Test_Flag'] = np.where(df['Title'].str.contains("test|Test"), 'Y', '')

But if I would like to say if column title or column subtitle include the word "test", add the test flag, how could I do that? 但是，如果我想说列标题或列字幕是否包含单词“test”，请添加测试标志，我该怎么做？

This obviously didn't work 这显然不起作用

df['Test_Flag'] = np.where(df['Title'|'Subtitle'].str.contains("test|Test"), 'Y', '')

Answer 1

If many columns then simplier is create subset df[['Title', 'Subtitle']] and apply contains , because works only with Series and check at least one True per row by any : 如果多列然后simplier是创建子df[['Title', 'Subtitle']]和apply contains ，因为只能与Series和检查至少一个True每行的any ：

mask = df[['Title', 'Subtitle']].apply(lambda x: x.str.contains("test|Test")).any(axis=1)
df['Test_Flag'] = np.where(mask,'Y', '')

Sample: 样品：

df = pd.DataFrame({'Title':['test','Test','e', 'a'], 'Subtitle':['b','a','Test', 'a']})
mask = df[['Title', 'Subtitle']].apply(lambda x: x.str.contains("test|Test")).any(axis=1)
df['Test_Flag'] = np.where(mask,'Y', '')
print (df)
  Subtitle Title Test_Flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a

Answer 2

pattern = "test|Test"
match = df['Title'].str.contains(pattern) | df['Subtitle'].str.contains(pattern)
df['Test_Flag'] = np.where(match, 'Y', '')

Answer 3

Using @jezrael's setup 使用@ jezrael的设置

df = pd.DataFrame(
    {'Title':['test','Test','e', 'a'],
     'Subtitle':['b','a','Test', 'a']})

`pandas`

you can stack + str.contains + unstack 你可以stack + str.contains + unstack

import re

df.stack().str.contains('test', flags=re.IGNORECASE).unstack()

  Subtitle  Title
0    False   True
1    False   True
2     True  False
3    False  False

Bring it all together with 把它全部带到一起

truth_map = {True: 'Y', False: ''}
truth_flag = df.stack().str.contains(
    'test', flags=re.IGNORECASE).unstack().any(1).map(truth_map)
df.assign(Test_flag=truth_flag)

  Subtitle Title Test_flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a

`numpy`

if performance is a concern 如果表现是一个问题

v = df.values.astype(str)
low = np.core.defchararray.lower(v)
flg = np.core.defchararray.find(low, 'test') >= 0
ys = np.where(flg.any(1), 'Y', '')
df.assign(Test_flag=ys)

  Subtitle Title Test_flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a

naive time test 天真的时间测试

python pandas列以其他两个列值为条件

问题描述

3 个解决方案

解决方案1
4 2017-03-27 19:50:39

解决方案2
3 已采纳 2017-03-27 19:52:13

解决方案3
2 2017-03-27 20:35:29

`pandas`

`numpy`

python pandas列以其他两个列值为条件

问题描述

3 个解决方案

解决方案1 4 2017-03-27 19:50:39

解决方案2 3 已采纳 2017-03-27 19:52:13

解决方案3 2 2017-03-27 20:35:29

pandas

numpy

解决方案1
4 2017-03-27 19:50:39

解决方案2
3 已采纳 2017-03-27 19:52:13

解决方案3
2 2017-03-27 20:35:29

`pandas`

`numpy`