简体   繁体   English

如何使用 Python 和 re 从字符串中提取准确的单词?

[英]How to extract exact words from a string using Python and re?

The data sample is:数据样本为:

a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
                           'and you xxx and x you xxxxxx and you and you']})
b=['i','and you']

There are two words (phases) in b. b中有两个词(阶段)。 I want to find them in a.我想在一个中找到它们。 I want to find the exact words, instead of substrings.我想找到确切的单词,而不是子字符串。 So, I want the result to be:所以,我希望结果是:

['i' ,'i' ,'i']
['and you',' and you ',' and you']

I need to count how many times these words occur in a string.我需要计算这些单词在字符串中出现的次数。 So I do not really need the above lists.所以我真的不需要上面的列表。 I put it here because I want to show I want to find the exact words in the strings.我把它放在这里是因为我想表明我想在字符串中找到确切的单词。 Here is my try:这是我的尝试:

s='r\'^'+b[0]+' | '+b[0]+' | '+b[0]+'$\''
len(re.findall(s,a.loc[0,'Strings']))

I hope s can find the words in the beginning, in the middle and at the end.我希望s能找到开头、中间和结尾的词。 I have a big a and b .我有一个很大的ab So I cannot just use the real string in here.所以我不能在这里只使用真正的字符串。 But the result is:但结果是:

len(re.findall(s,a.loc[0,'Strings']))
Out[110]: 1
re.findall(s,a.loc[0,'Strings'])
Out[111]: [' i ']

Looks like only the middle one is matched and found.看起来只有中间的一个被匹配并找到。 I am not sure where I went wrong.我不确定我哪里出错了。

a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
                           'and you xxx and x you xxxxxx and you and you']})
print(a.Strings.str.findall('i |and you'))

Output输出

0                   [i , i , i ]
1    [and you, and you, and you]
Name: Strings, dtype: object

print(a.Strings.str.findall('{} |{}'.format(*b)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中使用方法startswith()和re.findall()提取确切的单词 - use of method startswith() and re.findall() in python to extract exact words 如何使用python从文本中提取单词? - How to extract words from a text using python? 从Python重新提取字符串 - Extract string from Python re 如何提取 python 字符串中的单词 - How To extract words in a python string 如何在 python 中的字符串中查找和替换整个单词(完全匹配),而字符串包含元字符? - How to find and replace the whole word (exact match) from a string in python using “re” package while the string contains the metacharacter? 在 Python 中使用正则表达式提取准确的单词或字符集 - Extract exact words or set of characters using Regex in Python 从字符串 python 中提取单词并使用 nlp 将它们添加到数组中 - extract words from string python and add them in array using nlp 在Python中使用正则表达式从字符串中提取具有特定字符的单词列表 - Extract list of words with specific character from string using regex in Python Python - 使用情绪维达从字符串中提取正面词 - Python -extract positive words from a string using sentiment vader 如何使用正则表达式从冒号前的字符串中提取单词并在 python 中排除 \n - How can i extract words from a string before colon and excluding \n from them in python using regex
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM