简体   繁体   English

如何在python字符串中检测多个关键字?

[英]How can I detect multiple keywords in python string?

I am looking for a way to create several lists and for the keywords in those lists to be extracted and matched with a responce. 我正在寻找一种创建多个列表的方法,并寻找这些列表中的关键字以提取并与响应进行匹配。

User Input: This is a good day I am heading out for a jog . 用户输入:今天是我要慢跑好日子

List 1 : Keywords : good day, great day, awesome day, best day. 清单1:关键字:美好的一天,美好的一天,真棒的一天,美好的一天。 List 2 : Keywords : a run, a swim, a game. 清单2:关键字:跑步,游泳,比赛。

But for a huge database of words, can this be linked to just the lists? 但是对于庞大的单词数据库,可以将其仅链接到列表吗? Or does it need to be especific words? 还是需要特定的单词?

Also would you recommend Python for a huge database of keywords? 您还会为庞大的关键字数据库推荐Python吗?

The first thing to do is to break the input string up into tokens. 首先要做的是将输入字符串分解为令牌。 A token is just a piece of the string that you want to match. 令牌只是您要匹配的字符串的一部分。 In your case, it looks like your token size is 2 words (but it doesn't have to be). 在您的情况下,令牌大小似乎是2个字(但不一定是2个字)。 You might also want to strip all punctuation from the input string as well. 您可能还希望从输入字符串中删除所有标点符号。

Then for your input, your tokens are ['This is', 'is a', 'a good', 'good day', 'day I', 'I am', 'am heading', 'heading out', 'out for', 'for a', 'a jog'] 然后,您输入的标记是['This is','is a','a good','good day','day I','I am','am heading','heading out','争取”,“争取”,“慢跑”]

Then you can iterate over the tokens and check to see if they're contained in each one of the lists. 然后,您可以遍历令牌,并检查它们是否包含在每个列表中。 Might look like this: 可能看起来像这样:

input = 'This is a good day I am heading out for a jog'
words = input.split(' ')
tokens = [' '.join(words[i:i+2]) for i in range(len(words) - 1)]
for token in tokens:
  if token in list1:
    print('{} is in list1'.format(token))
  if token in list2:
    print('{} is in list2'.format(token))

One thing you will likely want to do to optimize this is to use set s for list1 and list2, instead of lists. 您可能需要做的一件事是优化此操作,将set s用于list1和list2而不是列表。

set1 = set(list1)

set s offer O(1) lookups, as opposed to O(n) for lists, which is critical if your keyword lists are large. set提供O(1)查找,而不是列表O(n),这对于关键字列表很大时至关​​重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM