[英]How can I detect multiple keywords in python string?
I am looking for a way to create several lists and for the keywords in those lists to be extracted and matched with a responce. 我正在寻找一种创建多个列表的方法,并寻找这些列表中的关键字以提取并与响应进行匹配。
User Input: This is a good day I am heading out for a jog . 用户输入:今天是我要慢跑的好日子 。
List 1 : Keywords : good day, great day, awesome day, best day. 清单1:关键字:美好的一天,美好的一天,真棒的一天,美好的一天。 List 2 : Keywords : a run, a swim, a game.
清单2:关键字:跑步,游泳,比赛。
But for a huge database of words, can this be linked to just the lists? 但是对于庞大的单词数据库,可以将其仅链接到列表吗? Or does it need to be especific words?
还是需要特定的单词?
Also would you recommend Python for a huge database of keywords? 您还会为庞大的关键字数据库推荐Python吗?
The first thing to do is to break the input string up into tokens. 首先要做的是将输入字符串分解为令牌。 A token is just a piece of the string that you want to match.
令牌只是您要匹配的字符串的一部分。 In your case, it looks like your token size is 2 words (but it doesn't have to be).
在您的情况下,令牌大小似乎是2个字(但不一定是2个字)。 You might also want to strip all punctuation from the input string as well.
您可能还希望从输入字符串中删除所有标点符号。
Then for your input, your tokens are ['This is', 'is a', 'a good', 'good day', 'day I', 'I am', 'am heading', 'heading out', 'out for', 'for a', 'a jog'] 然后,您输入的标记是['This is','is a','a good','good day','day I','I am','am heading','heading out','争取”,“争取”,“慢跑”]
Then you can iterate over the tokens and check to see if they're contained in each one of the lists. 然后,您可以遍历令牌,并检查它们是否包含在每个列表中。 Might look like this:
可能看起来像这样:
input = 'This is a good day I am heading out for a jog'
words = input.split(' ')
tokens = [' '.join(words[i:i+2]) for i in range(len(words) - 1)]
for token in tokens:
if token in list1:
print('{} is in list1'.format(token))
if token in list2:
print('{} is in list2'.format(token))
One thing you will likely want to do to optimize this is to use set
s for list1 and list2, instead of lists. 您可能需要做的一件事是优化此操作,将
set
s用于list1和list2而不是列表。
set1 = set(list1)
set
s offer O(1) lookups, as opposed to O(n) for lists, which is critical if your keyword lists are large. set
提供O(1)查找,而不是列表O(n),这对于关键字列表很大时至关重要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.