简体   繁体   English

Python Regex:匹配由一个其他单词完全分隔的任何重复单词

[英]Python Regex: match any repeated words that are separated by exactly one other word

I encountered this problem where I need to use regex to find repeated words separated by another word.我遇到了这个问题,我需要使用正则表达式来查找由另一个单词分隔的重复单词。

So if:因此,如果:

"all in all" will return: "all" "all in all"将返回: "all"

"good good good" will return: Null (Same word not another word) "good good good"将返回: Null (同一个词不是另一个词)

I have tried:我试过了:

p = re.compile(r'(\b\w+\b)\s\w+\s\1')
m = p.findall('all in all day in and day out bit by bit good good good')

print(m)

This returns ['all', 'bit', 'good'] , but I only want it to return ['all','bit'] .这将返回['all', 'bit', 'good'] ,但我只希望它返回['all','bit']

Thanks in advance!提前致谢!

You just need to add a negative lookahead for the word immediately following the initial capture group to ensure your regex can't match (for example) good good :您只需要在初始捕获组之后立即为单词添加否定前瞻,以确保您的正则表达式无法匹配(例如) good good

import re

p = re.compile(r'(\b\w+\b)(?!\s\1\b)\s\w+\s\1\b')
m = p.findall('all in all day in and day out bit by bit good good good')

print(m)

Output:输出:

['all', 'bit']

If you want to include overlapping matches, make the entire regex a positive lookahead (thanks @ggorlen):如果要包含重叠匹配项,请将整个正则表达式设为正向预测(感谢 @ggorlen):

p = re.compile(r'(?=(\b\w+\b)(?!\s\1\b)\s\w+\s\1\b)')
m = p.findall('foo bar foo bar foo')

Output:输出:

['foo', 'bar', 'foo']

If you also need to remove duplicate matches, convert to a set and back to a list :如果您还需要删除重复的匹配项,请转换为set并返回list

p = re.compile(r'(?=(\b\w+\b)(?!\s\1\b)\s\w+\s\1\b)')
m = list(set(p.findall('foo bar foo bar foo')))
print(m)

Output:输出:

['foo', 'bar']

No need for regex;不需要正则表达式; normal programming constructs can handle this sort of problem just fine.正常的编程结构可以很好地处理此类问题。 Write a loop and add a conditional:编写一个循环并添加一个条件:

s = 'all in all day in and day out bit by bit good good good'

words = s.split()
result = []

for i in range(len(words) - 2):
    if words[i] == words[i+2] and words[i] != words[i+1]:
        result.append(words[i])

print(result) # ['all', 'bit']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式找到与其他单词分开的单词 - Python regex finding words separated with other words 仅当另一个单词中第一个单词的索引都匹配时,如何返回一个单词与列表中所有其他单词的索引的匹配? - How match index of one word to index of all other words in list return only if all index match of first word in other any word? 正则表达式最多匹配任何三个单词或没有单词 btw 2 个单词 - Regex to match any three words at max or no word btw 2 words Python正则表达式可匹配任何包含正好为n位数字的单词,但也可以包含其他字符 - Python regex that matches any word that contains exactly n digits, but can contain other characters too 正则表达式-匹配两个单词或一个单词,但优先选择两个单词 - Regex - Match two words or one word, but give preference to two words Python Regex每隔一个单词都匹配 - Python Regex match every other word 正则表达式:匹配句号和python中的一个单词 - Regex: match fullstop and one word in python 使用正则表达式或任何其他方式匹配字符串中列表的所有单词 - Match all words of list in string using regex or any other way 如何正则表达式完全匹配由任意数量的换行符和/或文本分隔的两个时间戳 - How to regex match exactly two timestamps separated by any number of newlines and/or text 使用regex python匹配字符串中的特定单词或一组单词 - Match specific word or set of words in a string with regex python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM