正则表达式 - 仅匹配带有连字符的单词大写

Question

我正在尝试匹配具有超过 1 个字母的单词并且：全部大写，第一个字母小写，后面的字母大写，或者仅当所有字母都是大写时才在中间包含连字符。 这是我的代码：

s = "ASCII, aSCII, AS-CII, AS-cii"

myset =   set(re.findall(r"\b[a-z]?[A-Z]+\-?[A-Z]{1,}",s))

Out[555]: {'AS', 'AS-CII', 'ASCII', 'aSCII'}

如您所见，不应返回"AS" ，因为它在连字符后包含小写字母。 我该如何解决这个问题？

试过这个，但结果是一个错误：

myset = set(re.findall(r"\b[a-z]?[A-Z]+\-?[A-Z]+{1,}",s))

  File "<ipython-input-545-7bdc0c902553>"
    myset = set(re.findall(r"\b[a-z]?[A-Z]+\-?[A-Z]+{1,}",s))

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/re.py", line 222, in findall
    return _compile(pattern, flags).findall(string)

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))

  File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 619, in _parse
    source.tell() - here + len(this))

error: multiple repeat

Answer 1

我们到了

res = [x[0] for x in re.findall(r"(([a-z]{1}[A-Z]+)|([A-Z]+\-[A-Z]+))",s)]
print(res)
print(set(res))

给

['aSCII', 'AS-CII']

告诉我。 我拆分为添加 OR 逻辑 | 之间。

Answer 2

您可以使用条件表达式：

(...)?(if true than this|else this)

对于您的情况，这可能是

\b([a-z])?(?(1)[A-Z]+|[-A-Z]+[A-Z])(?!-)\b

请参阅regex101.com 上的演示。

分解这读

\b # a word boundary ([az])? # match a lower case letter if it is there (?(1) # if the lower case letter is there, match this branch [AZ]+ | [-AZ]+[AZ] # else this one ) (?,-)\b # do not break at a -, followed by another boundary

Answer 3

以下正则表达式匹配所有提到的标准：

\b[a-z]*[A-Z]+[\-A-Z]+[A-Z]+\b

请在此处查看https://regex101.com/r/JNC4kN/1/

但是，如果您给出这种类型的示例，例如 aTH-THTH（连字符和大写后的小写字母），这将失败。 如果您只想要 UPPER-UPPER，请遵循以下正则表达式：

\b[a-z]{0,1}(?<!\-)[A-Z]+\b(?!\-)|\b[A-Z]+\-[A-Z]+\b

在这里检查

Answer 4

您可以使用以下正则表达式，它涵盖了与前面或后面跟连字符的单词有关的边缘情况（如下面的链接所示）：

(?<!\w|(?<=\w)-)(?:[a-zA-Z][A-Z]+|[A-Z]{2,}|[A-Z]+-[A-Z]+)(?!\w|-(?=\w))

演示

Python 的正则表达式引擎执行以下操作。

(?<!              # begin a negative lookbehind
  \w              # match word char
  |               # or
  (?<=\w)         # match a word char in a positive lookbehind
  -               # match '-'
)                 # end negative lookbehind
(?:               # begin non-cap grp
  [a-zA-Z][A-Z]+  # match a lc letter then 1+ uc letters
  |               # or
  [A-Z]{2,}       # match 2+ uc letters
  |               # or
  [A-Z]+-[A-Z]+   # match 1+ uc letters, '-', then 1+ uc letters
)                 # end non-cap grp
(?!               # begin negative lookahead
  \w              # match word char
  |               # or
  -               # match '-'
  (?=\w)          # match a word char in a positive lookahead
)                 # end negative lookahead

正则表达式 - 仅匹配带有连字符的单词大写

问题描述

4 个解决方案

解决方案1
0 2020-04-07 18:13:29

解决方案2
0 2020-04-07 18:20:29

解决方案3
0 2020-04-07 18:26:13

解决方案4
0 2020-04-07 19:09:02

正则表达式 - 仅匹配带有连字符的单词大写

问题描述

4 个解决方案

解决方案1 0 2020-04-07 18:13:29

解决方案2 0 2020-04-07 18:20:29

解决方案3 0 2020-04-07 18:26:13

解决方案4 0 2020-04-07 19:09:02

解决方案1
0 2020-04-07 18:13:29

解决方案2
0 2020-04-07 18:20:29

解决方案3
0 2020-04-07 18:26:13

解决方案4
0 2020-04-07 19:09:02