如何在 Python 中使用正则表达式匹配此模式

Question

I have a list of names with different notations: for example:我有一个带有不同符号的名称列表：例如：

 myList = [ab2000, abc2000_2000, AB2000, ab2000_1, ABC2000_01, AB2000_2, ABC2000_02, AB2000_A1]

the standarized version for those different notations are, for example:这些不同符号的标准化版本是，例如：

'ab2000' is 'ABC2000'
'ab2000_1' is 'ABC2000_01'
'AB2000_A1' is 'ABC2000_A1'

What I tried is to separate the different characters of the string using compile.我尝试的是使用 compile 来分隔字符串的不同字符。

input:输入：

compiled = re.compile(r'[A-Za-z]+|\d+|\W+')
compiled.findall("AB2000_2000_A1")

output: output：

characters = ['AB', '2000', '2000', 'A', '1']

Then applying:然后申请：

characters = list(set(characters))

To finally try to match the values of that list with the main components of the string: an alpha format followed by a digit format followed by an alphanumeric format.最后尝试将该列表的值与字符串的主要组成部分进行匹配：一个字母格式，后跟一个数字格式，然后是一个字母数字格式。

But as you can see in the previous output I can't match 'A1' into a single character using \W+.但正如您在之前的 output 中看到的那样，我无法使用 \W+ 将“A1”匹配为单个字符。 My desired output is:我想要的 output 是：

characters = ['AB', '2000', '2000', 'A1']

any idea to fix that?有什么办法解决这个问题吗？

o any better idea to solve my problem in general. o 任何更好的想法来解决我的一般问题。 Thank you, in advance.先感谢您。

Answer 1

Use the following pattern with optional groups and capturing groups :将以下模式与可选组和捕获组一起使用：

r'([A-Z]+)(\d+)(?:_([A-Z\d]+))?(?:_([A-Z\d]+))?'

and re.I flag.并re.I标记。

Note that (?:_([AZ\d]+))?请注意(?:_([AZ\d]+))? must be repeated in order to match both third and fourth group.必须重复以匹配第三组和第四组。 If you attempted to "repeat" this group, putting it once with "*" it would match only the last group, skipping the third group.如果您尝试“重复”该组，将其与“*”一起放置一次，它将仅匹配最后一组，跳过第三组。

To test it, I ran the following test:为了测试它，我运行了以下测试：

myList = ['ab2000', 'abc2000_2000', 'AB2000', 'ab2000_1', 'ABC2000_01',
    'AB2000_2', 'ABC2000_02', 'AB2000_A1', 'AB2000_2000_A1']
pat = re.compile(r'([A-Z]+)(\d+)(?:_([A-Z\d]+))?(?:_([A-Z\d]+))?', re.I)
for tt in myList:
    print(f'{tt:16} ', end=' ')
    mtch = pat.match(tt)
    if mtch:
        for it in mtch.groups():
            if it is not None:
                print(f'{it:5}', end=' ')
    print()

getting:得到：

ab2000            ab    2000  
abc2000_2000      abc   2000  2000  
AB2000            AB    2000  
ab2000_1          ab    2000  1     
ABC2000_01        ABC   2000  01    
AB2000_2          AB    2000  2     
ABC2000_02        ABC   2000  02    
AB2000_A1         AB    2000  A1    
AB2000_2000_A1    AB    2000  2000  A1

如何在 Python 中使用正则表达式匹配此模式

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-07 16:07:20

如何在 Python 中使用正则表达式匹配此模式

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-07 16:07:20

解决方案1
1 已采纳 2020-07-07 16:07:20