使用正则表达式在不同列的熊猫数据框中查找单词并创建新值

Question

假设我有一个包含以下内容的数据框：

df = pd.DataFrame({'Name':['John', 'Alice', 'Peter', 'Sue'],
                   'Job': ['Dentist', 'Blogger', 'Cook', 'Cook'], 
                  'Sector': ['Health', 'Entertainment', '', '']})

我想找到所有“厨师”，无论是否为大写字母，并将它们分配给名为“美食”的值的“部门”列，我该怎么做？ 并且不覆盖“部门”列中的其他条目？ 谢谢！

Answer 1

这是一种方法：

df.loc[df.Job.str.lower().eq('cook'), 'Sector'] = 'gastronomy'

print(df)

    Name      Job         Sector
0   John  Dentist         Health
1  Alice  Blogger  Entertainment
2  Peter     Cook     gastronomy
3    Sue     Cook     gastronomy

Answer 2

使用Series.str.match与regex和正则表达式标志不区分大小写（ ?i ）：

df.loc[df['Job'].str.match('(?i)cook'), 'Sector'] = 'gastronomy'

输出


    Name      Job         Sector
0  John   Dentist  Health       
1  Alice  Blogger  Entertainment
2  Peter  Cook     gastronomy   
3  Sue    Cook     gastronomy

使用正则表达式在不同列的熊猫数据框中查找单词并创建新值

问题描述

2 个解决方案

解决方案1
4 2020-01-14 15:46:38

解决方案2
2 2020-01-14 15:48:57

使用正则表达式在不同列的熊猫数据框中查找单词并创建新值

问题描述

2 个解决方案

解决方案1 4 2020-01-14 15:46:38

解决方案2 2 2020-01-14 15:48:57

解决方案1
4 2020-01-14 15:46:38

解决方案2
2 2020-01-14 15:48:57