查找单词的所有出现+子字符串

Question

I have the 'main' word, "LAUNCHER", and 2 other words, "LAUNCH" and "LAUNCHER". 我有“主要”一词“ LAUNCHER”，还有两个其他词“ LAUNCH”和“ LAUNCHER”。 I want to find out (using regex), which words are in the 'main' word. 我想找出（使用正则表达式），哪些词在“主”词中。 I'm using findAll, with the regex: "(LAUNCH)|(LAUNCHER)" , but this will only return LAUNCH and not both of them. 我使用带有正则表达式的findAll：“（LAUNCH）|（LAUNCHER）”，但这只会返回LAUNCH而不是两者。 How do i fix this? 我该如何解决？

import re
mainword = "launcher"
words = "(launch|launcher)"
matches = re.findall(words,mainword)
for match in matches:
  print(match)

Answer 1

you can try something like this: 您可以尝试这样的事情：

import re
mainword = "launcher"
words = "(launch|launcher)"
for x in (re.findall(r"[A-Za-z@#]+|\S", words)):
    if x in mainword:
        print (x)

result: 结果：

launch 发射

launcher 发射器

Answer 2

If you're not required to use regular expressions, this would be done more efficiently with the IN operator and a simple loop or list comprehension: 如果不需要使用正则表达式，则可以使用IN运算符和简单的循环或列表理解来更有效地完成此操作：

mainWord = "launcher"
words    = ["launch","launcher"]
matches  = [ word for word in words if word in mainWord ] 

# case insensitive...
matchWord = mainWord.lower()
matches   = [ word for word in words if word.lower() in matchWord ]

Even if you do require regex, a loop would be needed because re.findAll() never matches overlapping patterns : 即使您确实需要正则表达式，也将需要循环，因为re.findAll（）永远不会匹配重叠的模式：

import re
pattern   = re.compile("launcher|launch")
mainWord  = "launcher"
matches   = []
startPos  = 0
lastMatch = None
while startPos < len(mainWord):
    if lastMatch : match = pattern.match(mainWord,lastMatch.start(),lastMatch.end()-1) 
    else         : match = pattern.match(mainWord,startPos)
    if not match: 
        if not lastMatch : break
        startPos  = lastMatch.start() + 1
        lastMatch = None
        continue
    matches.append(mainWord[match.start():match.end()])
    lastMatch = match

print(matches)

note that, even with this loop, you need to have the longer words appear before shorter ones if you use the | 请注意，即使使用此循环，如果使用|，也需要让较长的单词出现在较短的单词之前。 operator in the regular expression. 正则表达式中的运算符。 This is because | 这是因为 is never greedy and will match the first word, not the longest one. 永远不会贪婪，并且会匹配第一个单词，而不是最长的单词。

查找单词的所有出现+子字符串

问题描述

2 个解决方案

解决方案1
0 2019-02-22 13:54:07

解决方案2
0 2019-02-22 14:38:45

查找单词的所有出现+子字符串

问题描述

2 个解决方案

解决方案1 0 2019-02-22 13:54:07

解决方案2 0 2019-02-22 14:38:45

解决方案1
0 2019-02-22 13:54:07

解决方案2
0 2019-02-22 14:38:45