简体   繁体   English

正则表达式获取单词,如果它在中间有字符但在开头或结尾没有

[英]Regex to get word if it has char in mid but not in beginning or end

I have this code that successfully checks if a string has a character in the middle of the string but not in the beginning or in the end.我有这段代码可以成功检查一个字符串是否在字符串中间有一个字符,但在开头或结尾没有。 My regex though return all the string that have the character in the beginning or in the end and then I just collect all the results that the regex wont return which is counter intuitive.我的正则表达式虽然返回所有在开头或结尾都有字符的字符串,然后我只收集所有正则表达式不会返回的结果,这是违反直觉的。 So my regex detects all the words that start or end with a specific character while I want a regex that detects all the words that have a specific character in the middle but not in the beginning or in the end.所以我的正则表达式检测所有以特定字符开头或结尾的单词,而我想要一个正则表达式来检测所有中间有特定字符但不是开头或结尾的单词。

This is my code:这是我的代码:

def getCharInMiddle():
    # with open("assets/grades.txt", "r") as file:
    #     grades = file.read()

    content = ["bDa", "MariaBkB", "DimAb", "OL gaBd"]
    pattern = "[a-zA-Z]*[bB][a-zA-Z]*"

    for entry in content:

        print(entry)
        result = re.match(pattern, entry)
        if not result:
            print("\n ----> " + entry + "\n")

Is there a regex that would check if an entry has the character in the middle but it does not have it in the beginning or end?是否有一个正则表达式可以检查一个条目是否在中间有字符但它在开头或结尾没有它?


Update更新

I updated the `pattern' to be:我将“模式”更新为:

pattern = "\b(?![bB])[a-zA-Z]+[bB][a-zA-Z]+(?<![bB])\b"

but then no entry is matched但是没有匹配的条目

You can try with this pattern你可以试试这个模式

^(?![bB]).*[bB].*(?<![bB])$

It matches the strings which contains b or B, but don't start or finish with those 2 characters.它匹配包含 b 或 B 的字符串,但不以这 2 个字符开头或结尾。

You can use您可以使用

re.findall(r'\b(?!b[a-z]+b\b)[a-z]+b[a-z]+\b', text, re.I)          # ASCII only letters
re.findall(r'\b(?!b[^\W\d_]+b\b)[^\W\d_]+b[^\W\d_]+\b', text, re.I) # Any Unicode letters

See the regex demo .请参阅正则表达式演示 Note the use of the case insensitive modifier.注意不区分大小写修饰符的使用。

Details :详情

  • \b - a word boundary \b - 单词边界
  • (?!b[^\W\d_]+b\b) - not followed with b , then one or more letters and then a b followed with a word boundary (?!b[^\W\d_]+b\b) - 不跟b ,然后是一个或多个字母,然后是b跟单词边界
  • [^\W\d_]+ - one or more letters [^\W\d_]+ - 一个或多个字母
  • b - a b b - a b
  • [^\W\d_]+ - one or more letters [^\W\d_]+ - 一个或多个字母
  • \b - a word boundary. \b - 单词边界。

See the Python demo :请参阅Python 演示

import re
content = ["bDa", "MariaBkB", "DimAb", "OL gaBd"]
pattern = re.compile(r'\b(?!b[^\W\d_]+b\b)[^\W\d_]+b[^\W\d_]+\b', re.I)
for s in content:
  print(  pattern.findall(s) )
# => [], ['MariaBkB'], [], ['gaBd']

This here may help you:这可以帮助你:

\b[^bB]\w+[bB]\w+[^bB]\b


\b - boundary
[^bB] - cannot be those letters
\w+ - all word characters (at least one)
[bB] - b or B in the middle
\w+ - all word characters (at least one)
[^bB] - cannot be those letters
\b - boundary

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM