[英]how to check whether column of text contains specific string or not in pandas
I have following dataframe in pandas 我在pandas中有以下数据帧
job_desig salary
senior analyst 12
junior researcher 5
scientist 20
sr analyst 12
Now I want to generate one column which will have a flag set as below 现在我想生成一个列,其标志设置如下
sr = ['senior','sr']
job_desig salary senior_profile
senior analyst 12 1
junior researcher 5 0
scientist 20 0
sr analyst 12 1
I am doing following in pandas 我正在跟随熊猫
df['senior_profile'] = [1 if x.str.contains(sr) else 0 for x in
df['job_desig']]
You can join all values of list by |
您可以通过
|
加入列表的所有值 for regex OR
, pass to Series.str.contains
and last cast to integer for True/False
to 1/0
mapping: 对于正则表达式
OR
,传递给Series.str.contains
并最后转换为整数,用于True/False
到1/0
映射:
df['senior_profile'] = df['job_desig'].str.contains('|'.join(sr)).astype(int)
If necessary, use word boundaries: 如有必要,请使用字边界:
pat = '|'.join(r"\b{}\b".format(x) for x in sr)
df['senior_profile'] = df['job_desig'].str.contains(pat).astype(int)
print (df)
job_desig salary senior_profile
0 senior analyst 12 1
1 junior researcher 5 0
2 scientist 20 0
3 sr analyst 12 1
Soluttion with sets, if only one word values in list: 如果列表中只有一个单词值,则使用集合求解:
df['senior_profile'] = [int(bool(set(sr).intersection(x.split()))) for x in df['job_desig']]
你可以通过简单地使用str.contains
来做到这str.contains
df['senior_profile'] = df['job_desig'].str.contains('senior') | df['job_desig'].str.contains('sr')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.