[英]Check if pandas dataframe contains specific string from a list of items
I have a list我有一个清单
my_list = ['element1 line','element2 ','element3', 'element4 line',....]
and I have a pandas dataframe having df
[ Sentences
] column and df['flag']
column我有一个 pandas dataframe 有
df
[ Sentences
] 列和df['flag']
列
df
Sentences flag
0 abcd
1 efgh
2 element1 ijkl
3 mnop element3 element4
4 qrst
I want to iterate to each and every row of dataframe of column Sentences
.我想迭代到列
Sentences
的 dataframe 的每一行。 If any of the elements in my_list is present in the Sentences
, df['flag']
column should be 1 in the respective row.如果 my_list 中的任何元素出现在
Sentences
中,则相应行中的df['flag']
列应为 1。 If no elements is present in the string of sentences in that row, df['flag']
should be 0 for that row.如果该行的句子字符串中不存在任何元素,则该行的
df['flag']
应该为 0。
Expected output:预期 output:
df
Sentences flag
0 abcd 0
1 efgh 0
2 element1 ijkl 1
3 mnop element3 element4 1
4 qrst 0
You need to use a loop:您需要使用循环:
df['flag'] = [int(any(w in my_list for w in x.split())) for x in df['Sentences']]
output: output:
Sentences flag
0 abcd 0
1 efgh 0
2 element1 ijkl 1
3 mnop element3 element4 1
4 qrst 0
Note that you could use pure pandas, but this is much slower:请注意,您可以使用纯 pandas,但这要慢得多:
df['flag'] = (df['Sentences']
.str.split()
.explode().isin(my_list)
.groupby(level=0).any().astype(int)
)
You can also try this without 2 for-loops:您也可以在没有 2 个 for 循环的情况下尝试此操作:
df['flag'] = df['Sentences'].str.split().map(set).
apply(lambda x: any(x.intersection(my_list))*1)
Hi it is possible to return to value of list instead of true or false only?嗨,可以返回列表的值而不是仅返回 true 或 false 吗?
Something like:就像是:
0 abcd 0 abcd
1 efgh 1个
2 element1 ijkl element1 3 mnop element3 element4 element3 4 qrst 2 元素 1 ijkl 元素 1 3 mnop 元素 3 元素 4 元素 3 4 qrst
Thank you谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.