[英]Python Pandas - check for substring containment and set new column to substring
I need to check for string containment and set the new column to the substring value. 我需要检查字符串是否包含,并将新列设置为子字符串值。 I am currently trying this
我目前正在尝试
df['NEW_COL'] = df['COL_TO_CHECK'].str.contains('|'.join(substring_list))
instead of returning the boolean true false for containment... I need to return the actual value from substring_list
that matches to populate df['NEW_COL]
而不是返回布尔值true false进行遏制...我需要从
substring_list
返回与填充df['NEW_COL]
相匹配的实际值
substring_list = ['apple', 'banana', 'cherry']
OLD_COL NEW_COL
apple pie apple
black cherry cherry
banana lemon drop banana
You are not being very insightful regarding what is your data and what you want, but the general principle is that you can use: 您对什么是数据以及想要什么不是很有见识,但是一般原则是可以使用:
df['NEW_COL'] = df['COL_TO_CHECK'].apply(lambda x: do_something(x) if is_something(x) else x)
Or in your example: 或在您的示例中:
substring_list = set(['apple', 'banana', 'cherry'])
df['NEW_COL'] = df['OLD_COL'].apply(lambda x: set(x.split()).intersection(substring_list).pop())
set
is faster :) set
更快:)
I'd do it this way: 我会这样:
In [148]: df
Out[148]:
OLD_COL
0 apple pie
1 black cherry
2 banana lemon drop
In [149]: pat = '.*({}).*'.format('|'.join(substring_list))
In [150]: pat
Out[150]: '.*(apple|banana|cherry).*'
In [151]: df['NEW_COL'] = df['OLD_COL'].str.replace(pat, r'\1')
In [152]: df
Out[152]:
OLD_COL NEW_COL
0 apple pie apple
1 black cherry cherry
2 banana lemon drop banana
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.