简体   繁体   English

Python Pandas-检查子字符串是否包含并将新列设置为子字符串

[英]Python Pandas - check for substring containment and set new column to substring

I need to check for string containment and set the new column to the substring value. 我需要检查字符串是否包含,并将新列设置为子字符串值。 I am currently trying this 我目前正在尝试

df['NEW_COL'] = df['COL_TO_CHECK'].str.contains('|'.join(substring_list))

instead of returning the boolean true false for containment... I need to return the actual value from substring_list that matches to populate df['NEW_COL] 而不是返回布尔值true false进行遏制...我需要从substring_list返回与填充df['NEW_COL]相匹配的实际值

SUBSTRINGS TO CHECK FOR 要检查的内容

substring_list = ['apple', 'banana', 'cherry']

RESULTING DATAFRAME 结果数据帧

OLD_COL              NEW_COL
apple pie            apple
black cherry         cherry
banana lemon drop    banana

You are not being very insightful regarding what is your data and what you want, but the general principle is that you can use: 您对什么是数据以及想要什么不是很有见识,但是一般原则是可以使用:

df['NEW_COL'] = df['COL_TO_CHECK'].apply(lambda x: do_something(x) if is_something(x) else x)

Or in your example: 或在您的示例中:

substring_list = set(['apple', 'banana', 'cherry'])
df['NEW_COL'] = df['OLD_COL'].apply(lambda x: set(x.split()).intersection(substring_list).pop())

set is faster :) set更快:)

I'd do it this way: 我会这样:

In [148]: df
Out[148]:
             OLD_COL
0          apple pie
1       black cherry
2  banana lemon drop

In [149]: pat = '.*({}).*'.format('|'.join(substring_list))

In [150]: pat
Out[150]: '.*(apple|banana|cherry).*'

In [151]: df['NEW_COL'] = df['OLD_COL'].str.replace(pat, r'\1')

In [152]: df
Out[152]:
             OLD_COL NEW_COL
0          apple pie   apple
1       black cherry  cherry
2  banana lemon drop  banana

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM