python3使用正则表达式解析字符串（包含“ *”）

Question

Let's say string has pattern like this (\\d+)(X|Y|Z)(!|#)? 假设字符串具有这样的pattern (\\d+)(X|Y|Z)(!|#)?
digits appear => one of X or Y or Z appear => ! or # digits出现=> X or Y or Z出现=> ! or # ! or # not always appear. ! or #并不总是出现。

I want to parse string and want to return list. 我想解析字符串并想返回列表。

ex1) str = 238Z!32Z#11234X ex1）str = 238Z！32Z＃11234X
I want to return [238Z!, 32Z#, 11234X] 我想返回[238Z !, 32Z＃，11234X]

ex2) str = 91X92Y93Z ex2）str = 91X92Y93Z
I want to return [91X, 92Y, 93Z] 我想退货[91X，92Y，93Z]

below is my code. 下面是我的代码。

# your code goes here
import re

p=re.compile('^(\d+)(X|Y|Z)(!|#)?$')
L=p.findall("238Z!32Z!11234X")
print(L)

but I got empty list [] . 但我有空列表[] 。
what's wrong with me? 我怎么了

Answer 1

Dont use the ^ and $ in regex. 不要在正则表达式中使用^和$ 。 ^ matches start of line, $ matches end of line. ^匹配行首， $匹配行尾。 That means your regex will only match string that begins and ends a line. 这意味着您的正则表达式将只匹配以行开头和结尾的字符串。

import re

p=re.compile('(\d+)(X|Y|Z)(!|#)?')
L=p.findall("238Z!32Z!11234X")
print(L)

Output: 输出：

[('238', 'Z', '!'), ('32', 'Z', '!'), ('11234', 'X', '')]

If you wish to not get a tuples, but instead whole strings that were matched, don't use capturing groups: 如果您不希望获取元组，而是希望获取匹配的整个字符串，请不要使用捕获组：

p=re.compile('(?:\d+)(?:X|Y|Z)(?:!|#)?')

Output: 输出：

['238Z!', '32Z!', '11234X']

Answer 2

First, ^ and $ are metacharacters used to match the start and end of your string (not the pattern). 首先， ^和$是用于匹配字符串开头和结尾（不是模式）的元字符。 So you have to remove them so that your regex can find all the corresponding patterns. 因此，您必须删除它们，以便您的正则表达式可以找到所有相应的模式。

Second, the findall function will return a list of groups if your pattern contains at least one. 其次，如果您的模式包含至少一个，则findall函数将返回一组列表。 Groups are defined by the parentheses in your pattern. 组由模式中的括号定义。 You should use a non-capturing group (?:...) . 您应该使用非捕获组 (?:...) 。

import re

p = re.compile('(?:\d+)(?:X|Y|Z)(?:!|#)?')
L = p.findall("238Z!32Z!11234X")
print(L)
# ['238Z!', '32Z!', '11234X']

Another advice when writing a regex. 编写正则表达式时的另一条建议。 If you want to match a list of characters, you do not need (a|b|c) , you can use [abc] which has the same meaning. 如果要匹配字符列表，则不需要(a|b|c) ，可以使用具有相同含义的[abc] 。

Moreover, you do not need to use parentheses if you want to quantify a single element. 此外，如果要量化单个元素，则无需使用括号。 (\\d+) is equivalent to \\d+ , and you will not have any group problem anymore. (\\d+)等效于\\d+ ，您将不再有任何组问题。

Your regex would then become: 您的正则表达式将变为：

\d+[XYZ][!#]?

Answer 3

You should not use ^ or $ anchors as they will require your string to match completely with one pattern. 您不应使用^或$锚，因为它们将要求您的字符串完全与一种模式匹配。

Also don't use capture groups if you want to get the desired result: 如果要获得所需的结果，也不要使用捕获组：

p=re.compile('\d+[XYZ][!#]?')

['238Z!', '32Z!', '11234X'] ['238Z！'，'32Z！'，'11234X']

python3使用正则表达式解析字符串（包含“ *”）

问题描述

3 个解决方案

解决方案1
0 已采纳 2017-09-16 07:15:40

解决方案2
0 2017-09-16 07:16:09

解决方案3
0 2017-09-16 07:16:49

python3使用正则表达式解析字符串（包含“ *”）

问题描述

3 个解决方案

解决方案1 0 已采纳 2017-09-16 07:15:40

解决方案2 0 2017-09-16 07:16:09

解决方案3 0 2017-09-16 07:16:49

解决方案1
0 已采纳 2017-09-16 07:15:40

解决方案2
0 2017-09-16 07:16:09

解决方案3
0 2017-09-16 07:16:49