简体   繁体   English

检查句子中的单词

[英]Check for words in a sentence

I write a program in Python.我用 Python 编写了一个程序。 The user enters a text message.用户输入文本消息。 It is necessary to check whether there is a sequence of words in this message.需要检查此消息中是否存在单词序列。 Sample.样本。 Message: "Hello world, my friend.".消息:“你好世界,我的朋友。”。 Check the sequence of these two words: "Hello", "world".检查这两个单词的顺序:“Hello”、“world”。 The Result Is "True".结果是“真”。 But when checking the sequence of these words in the message: "Hello, beautiful world "the result is"false".但是当检查消息中这些单词的顺序时:“你好,美丽的世界”,结果是“假”。 When you need to check the presence of only two words it is possible as I did it in the code, but when combinations of 5 or more words is difficult.当您只需要检查是否存在两个单词时,可以像我在代码中所做的那样,但是当 5 个或更多单词的组合很困难时。 Is there any small solution to this problem?这个问题有什么小的解决方案吗?

s=message.text
s=s.lower()
lst = s.split()
elif "hello" in lst and "world" in lst :
    if "hello" in lst:
        c=lst.index("hello")
    if lst[c+1]=="world" or lst[c-1]=="world":
        E=True
    else:
        E=False

The straightforward way is to use a loop.直接的方法是使用循环。 Split your message into individual words, and then check for each of those in the sentence in general .将您的信息拆分为单个单词,然后大致检查句子中的每个单词。

word_list = message.split()     # this gives you a list of words to find
word_found = True
for word in word_list:
    if word not in message2:
        word_found = False

print(word_found)

The flag word_found is True iff all words were found in the sentence.如果在句子中找到所有单词,则标志word_foundTrue There are many ways to make this shorter and faster, especially using the all operator, and providing the word list as an in-line expression.有很多方法可以使这个更短、更快,尤其是使用all运算符,并将单词列表作为内嵌表达式提供。

word_found = all(word in message2 for word in message.split())

Now, if you need to restrict your "found" property to matching exact words, you'll need more preprocessing.现在,如果您需要将“found”属性限制为匹配精确的单词,则需要更多的预处理。 The above code is too forgiving of substrings, such as finding "Are you OK ?"上面的代码对子字符串太宽容了,比如找到“Are you OK ?” in the sentence "your joke is only barely funny".在“你的笑话只是勉强好笑”这句话中。 For the more restrictive case, you should break message2 into words, strip those words of punctuation, drop them to lower-case (to make matching easier), and then look for each word (from message ) in the list of words from message2 .对于更严格的情况,您应该将message2分成单词, message2这些标点符号,将它们变成小写(以便更容易匹配),然后在来自message2的单词列表中查找每个单词(来自message )。

Can you take it from there?你能从那里拿走吗?

I don't know if that what you really need but this worked you can tested我不知道这是否是您真正需要的,但这有效,您可以测试

message= 'hello world'
message2= ' hello beautiful world' 
if 'hello' in message and 'world'  in message :
  print('yes')
else :
  print('no')
if   'hello' in message2 and 'world'  in message2 :
  print('yes')  

out put : yes yes输出:是的

I will clarify your requirement first:我先澄清你的要求:

  • ignore case忽略大小写

  • consecutive sequence连续序列

  • match in any order, like permutation or anagram以任何顺序匹配,如排列或字谜

  • support duplicated words支持重复词

if the number is not too large, you can try this easy-understanding but not the fastest way.如果数量不是太大,您可以尝试这种易于理解但不是最快的方法。

  1. split all words in text message拆分短信中的所有单词
  2. join them with ' '' '加入他们
  3. list all the permutation of words and join them with ' ' too, For example, if you want to check sequence of ['Hello', 'beautiful', 'world'] .列出单词的所有排列并用' '将它们连接起来,例如,如果您想检查['Hello', 'beautiful', 'world']序列。 The permutation will be 'Hello beautiful world' , 'Hello world beautiful' , 'beautiful Hello world' ... and so on.排列将是'Hello beautiful world''Hello world beautiful''beautiful Hello world' ... 等等。
  4. and you can just find whether there is one permutation such as 'hello beautiful world' is in it.并且您可以仅查找其中是否存在诸如'hello beautiful world'排列。

The sample code is here:示例代码在这里:

import itertools
import re

# permutations brute-force, O(nk!)
def checkWords(text, word_list):
    # split all words without space and punctuation
    text_words= re.findall(r"[\w']+", text.lower())

    # list all the permutations of word_list, and match
    for words in itertools.permutations(word_list):
        if ' '.join(words).lower() in ' '.join(text_words):
            return True
    return False

    # or use any, just one line
    # return any(' '.join(words).lower() in ' '.join(text_words) for words in list(itertools.permutations(word_list)))
def test():
    # True
    print(checkWords('Hello world, my friend.', ['Hello', 'world', 'my']))
    # False
    print(checkWords('Hello, beautiful world', ['Hello', 'world']))
    # True
    print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'beautiful']))
    # True
    print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'world']))

But it costs a lot when words number is large, k words will generate k!但是当词数很大时成本很高,k个词会产生k个! permutation, the time complexity is O(nk!).排列,时间复杂度是 O(nk!)。

I think a more efficient solution is sliding window .我认为更有效的解决方案是sliding window The time complexity will decrease to O(n):时间复杂度将降低到 O(n):

import itertools
import re
import collections

# sliding window, O(n)
def checkWords(text, word_list):
    # split all words without space and punctuation
    text_words = re.findall(r"[\w']+", text.lower())
    counter = collections.Counter(map(str.lower, word_list))
    start, end, count, all_indexes = 0, 0, len(word_list), []

    while end < len(text_words):
        counter[text_words[end]] -= 1
        if counter[text_words[end]] >= 0:
            count -= 1
        end += 1

        # if you want all the index of match, you can change here
        if count == 0:
            # all_indexes.append(start)
            return True

        if end - start == len(word_list):
            counter[text_words[start]] += 1
            if counter[text_words[start]] > 0:
                count += 1
            start += 1

    # return all_indexes
    return False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM