I have the 'main' word, "LAUNCHER", and 2 other words, "LAUNCH" and "LAUNCHER". I want to find out (using regex), which words are in the 'main' word. I'm using findAll, with the regex: "(LAUNCH)|(LAUNCHER)" , but this will only return LAUNCH and not both of them. How do i fix this?
import re
mainword = "launcher"
words = "(launch|launcher)"
matches = re.findall(words,mainword)
for match in matches:
print(match)
you can try something like this:
import re
mainword = "launcher"
words = "(launch|launcher)"
for x in (re.findall(r"[A-Za-z@#]+|\S", words)):
if x in mainword:
print (x)
result:
launch
launcher
If you're not required to use regular expressions, this would be done more efficiently with the IN operator and a simple loop or list comprehension:
mainWord = "launcher"
words = ["launch","launcher"]
matches = [ word for word in words if word in mainWord ]
# case insensitive...
matchWord = mainWord.lower()
matches = [ word for word in words if word.lower() in matchWord ]
Even if you do require regex, a loop would be needed because re.findAll() never matches overlapping patterns :
import re
pattern = re.compile("launcher|launch")
mainWord = "launcher"
matches = []
startPos = 0
lastMatch = None
while startPos < len(mainWord):
if lastMatch : match = pattern.match(mainWord,lastMatch.start(),lastMatch.end()-1)
else : match = pattern.match(mainWord,startPos)
if not match:
if not lastMatch : break
startPos = lastMatch.start() + 1
lastMatch = None
continue
matches.append(mainWord[match.start():match.end()])
lastMatch = match
print(matches)
note that, even with this loop, you need to have the longer words appear before shorter ones if you use the | operator in the regular expression. This is because | is never greedy and will match the first word, not the longest one.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.