[英]python3 parse string(contain '*') using regular expression
Let's say string has pattern
like this (\\d+)(X|Y|Z)(!|#)?
假设字符串具有这样的pattern
(\\d+)(X|Y|Z)(!|#)?
digits
appear => one of X or Y or Z
appear => ! or #
digits
出现=> X or Y or Z
出现=> ! or #
! or #
not always appear. ! or #
并不总是出现。
I want to parse string and want to return list. 我想解析字符串并想返回列表。
ex1) str = 238Z!32Z#11234X ex1)str = 238Z!32Z#11234X
I want to return [238Z!, 32Z#, 11234X] 我想返回[238Z !, 32Z#,11234X]
ex2) str = 91X92Y93Z ex2)str = 91X92Y93Z
I want to return [91X, 92Y, 93Z] 我想退货[91X,92Y,93Z]
below is my code. 下面是我的代码。
# your code goes here
import re
p=re.compile('^(\d+)(X|Y|Z)(!|#)?$')
L=p.findall("238Z!32Z!11234X")
print(L)
but I got empty list []
. 但我有空列表[]
。
what's wrong with me? 我怎么了
Dont use the ^
and $
in regex. 不要在正则表达式中使用^
和$
。 ^
matches start of line, $
matches end of line. ^
匹配行首, $
匹配行尾。 That means your regex will only match string that begins and ends a line. 这意味着您的正则表达式将只匹配以行开头和结尾的字符串。
import re
p=re.compile('(\d+)(X|Y|Z)(!|#)?')
L=p.findall("238Z!32Z!11234X")
print(L)
Output: 输出:
[('238', 'Z', '!'), ('32', 'Z', '!'), ('11234', 'X', '')]
If you wish to not get a tuples, but instead whole strings that were matched, don't use capturing groups: 如果您不希望获取元组,而是希望获取匹配的整个字符串,请不要使用捕获组:
p=re.compile('(?:\d+)(?:X|Y|Z)(?:!|#)?')
Output: 输出:
['238Z!', '32Z!', '11234X']
First, ^
and $
are metacharacters used to match the start and end of your string (not the pattern). 首先, ^
和$
是用于匹配字符串开头和结尾(不是模式)的元字符 。 So you have to remove them so that your regex can find all the corresponding patterns. 因此,您必须删除它们,以便您的正则表达式可以找到所有相应的模式。
Second, the findall
function will return a list of groups if your pattern contains at least one. 其次,如果您的模式包含至少一个,则findall
函数将返回一组列表。 Groups are defined by the parentheses in your pattern. 组由模式中的括号定义。 You should use a non-capturing group (?:...)
. 您应该使用非捕获组 (?:...)
。
import re
p = re.compile('(?:\d+)(?:X|Y|Z)(?:!|#)?')
L = p.findall("238Z!32Z!11234X")
print(L)
# ['238Z!', '32Z!', '11234X']
Another advice when writing a regex. 编写正则表达式时的另一条建议。 If you want to match a list of characters, you do not need (a|b|c)
, you can use [abc]
which has the same meaning. 如果要匹配字符列表,则不需要(a|b|c)
,可以使用具有相同含义的[abc]
。
Moreover, you do not need to use parentheses if you want to quantify a single element. 此外,如果要量化单个元素,则无需使用括号。 (\\d+)
is equivalent to \\d+
, and you will not have any group problem anymore. (\\d+)
等效于\\d+
,您将不再有任何组问题。
Your regex would then become: 您的正则表达式将变为:
\d+[XYZ][!#]?
You should not use ^
or $
anchors as they will require your string to match completely with one pattern. 您不应使用^
或$
锚,因为它们将要求您的字符串完全与一种模式匹配。
Also don't use capture groups if you want to get the desired result: 如果要获得所需的结果,也不要使用捕获组:
p=re.compile('\d+[XYZ][!#]?')
['238Z!', '32Z!', '11234X'] ['238Z!','32Z!','11234X']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.