![](/img/trans.png)
[英]Filtering DataFrame by finding exact word (not combined) in a column of strings
[英]Creating a new column by finding exact word in a column of strings
我想创建一个包含1或0的新列,如果列表中的任何单词与数据帧字符串列匹配为exaclty。
list_provided=["mul","the"]
#how my dataframe looks
id text
a simultaneous there the
b simultaneous there
c mul why
预期产出
id text found
a simultaneous there the 1
b simultaneous there 0
c mul why 1
因为无论是 “MUL”或“该”不完全的字符串列匹配“文本” 的第二排被分配0,
代码尝试到现在
#For exact match I am using the below code
data["Found"]=np.where(data["text"].str.contains(r'(?:\s|^)penalidades(?:\s|$)'),1,0)
如何迭代循环以查找提供的单词列表中所有单词的完全匹配?
编辑:如果我使用Georgey建议的str.contains(模式),数据[“Found”]的所有行都变为1
data=pd.DataFrame({"id":("a","b","c","d"), "text":("simultaneous there the","simultaneous there","mul why","mul")})
list_of_word=["mul","the"]
pattern = '|'.join(list_of_word)
data["Found"]=np.where(data["text"].str.contains(pattern),1,0)
Output:
id text found
a simultaneous there the 1
b simultaneous there 1
c mul why 1
d mul 1
找到的列中的第二行应为0
您可以使用pd.Series.apply
执行此操作,并使用生成器表达式sum
:
import pandas as pd
df = pd.DataFrame({'id': ['a', 'b', 'c'],
'text': ['simultaneous there the', 'simultaneous there', 'mul why']})
test_set = {'mul', 'the'}
df['found'] = df['text'].apply(lambda x: sum(i in test_set for i in x.split()))
# id text found
# 0 a simultaneous there the 1
# 1 b simultaneous there 0
# 2 c mul why 1
以上提供了一个计数 。 如果您只需要一个布尔值,请使用any
:
df['found'] = df['text'].apply(lambda x: any(i in test_set for i in x.split()))
对于整数表示,链.astype(int)
。
编辑1
试试这段代码
import pandas as pd
dataframe = [["simultaneous there the","simultaneous there","mul why","mul"],["a","b","c","d"]]
list_of_word = ["mul","the"]
dic = {
"id": dataframe[1],
"text": dataframe[0]
}
DataF = pd.DataFrame(dic)
found = []
for key in DataF["text"]:
anyvari = False
for damn in key.split(" "):
if(damn==list_of_word[0] or damn==list_of_word[1]):
anyvari = True
break
else:
continue
if(anyvari!=True):
found.append(0)
else:
found.append(1)
DataF["found"] = found
print(DataF)
它会给你这样的
id text found
0 a simultaneous there the 1
1 b simultaneous there 0
2 c mul why 1
3 d mul 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.