Python正则表达式错误

Question

我是使用正则表达式的新手，非常感谢这里的任何帮助。 我必须用以下格式的字符串解析文件（主要区别在于第二个字符串中间有一个额外的“-”字符串：

Abc_p123 abc_ghi_data
或者
Abc_de*_p123 abc_ghi_data

我可以编写一个正则表达式来分别匹配第一个和第二个字符串：

data_lst = re.findall('([a-zA-Z0-9]+_p\\d{3})\\s.*_data.*', content, re.IGNORECASE)
data_lst = re.findall('([a-zA-Z0-9]+_[a-zA-Z]+_p\\d{3})\\s.*_data.*', content, re.IGNORECASE)

有人可以指导如何组合两个 findall 正则表达式，以便它适用于两个字符串。 我仍然可以通过将第二个 findall 语句附加到第一个列表来创建一个组合的单个列表。 但是，我确信有一种方法可以在一个 findall regex 语句中处理它。 我在中间尝试了“.*”，但是，这给出了错误。

请指教。 谢谢，

Answer 1

你非常接近：

([a-zA-Z0-9]+(?:_[a-zA-Z]+\*)?_p\d{3})\s.*_data.*

这是重要的部分：

(?:_[a-zA-Z]+\*)?

它说：可选匹配一个下划线，后跟无限制的 az，后跟一个星号。

https://regex101.com/r/5XCsPK/1

Answer 2

你可以试试

([a-zA-Z0-9]+(_[a-zA-Z]+)?_p\d{3})\s.*_data.*

我用(_[a-zA-Z]+)?替换了_[a-zA-Z]+ (_[a-zA-Z]+)? 使其成为可选的。

如果您不想要额外的捕获组，请添加?:像这样： (?:_[a-zA-Z]+)?

演示： https : //regex101.com/r/5xynlx/2

Answer 3

用

([a-zA-Z0-9]+(?:_[a-zA-Z0-9*]+)?_p\d{3})\s.*_data

查看证明

解释

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      _                        '_'
--------------------------------------------------------------------------------
      [a-zA-Z0-9*]+            any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9', '*' (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    _p                       '_p'
--------------------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  _data                    '_data'

Python正则表达式错误

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-11-02 20:37:52

解决方案2
1 2020-11-02 20:38:09

解决方案3
0 2020-11-02 20:37:11

Python正则表达式错误

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-11-02 20:37:52

解决方案2 1 2020-11-02 20:38:09

解决方案3 0 2020-11-02 20:37:11

解决方案1
2 已采纳 2020-11-02 20:37:52

解决方案2
1 2020-11-02 20:38:09

解决方案3
0 2020-11-02 20:37:11