I have bunch of strings that comes in this flavor:
#q1_a1
#q7
basically # is the sign that has to be ignored. after #, there comes a single-letter alphabet plus some number. optionally, some alphabet + number combination can be followed after _
(underbar).
here's what I came up with:
>>> pat = re.compile(r"#(.*)_?(.+)?")
>>> pat.match('#q1').groups()
('q1', None)
the problem is strings of #q1_a1
format. when I apply what I made to such strings:
>>> pat.findall('#q1_f1')
[('q1_f1', '')]
any suggestions?
As the others have said, the more specific your regex, the less likely it is to match something it shouldn't:
In [13]: re.match(r'#([A-Za-z][0-9])(?:_([A-Za-z][0-9]))?', '#q1_a1').groups()
Out[13]: ('q1', 'a1')
In [14]: re.match(r'#([A-Za-z][0-9])(?:_([A-Za-z][0-9]))?', '#q1').groups()
Out[14]: ('q1', None)
Notes:
^
and $
. [0-9]
to [0-9]+
. Your ".*" matches also underscore, as the match is greedy. Better create more specific regex, to exclude underscore from the first group.
Proper regex could look like this:
#([a-z][0-9])_?([a-z][0-9])?
but you need to check, if it works for all the data you would expect.
Ps. Being more specific in regular expressions is better, as you have less false positives.
When you use .*
, it greedy matches as many as possible. Try:
>>> pat = re.compile(r"#([^_]*)_?(.+)?")
>>> pat.findall('#q1_f1')
[('q1', 'f1')]
As well, it's better to write a more specific expression:
#([a-z][0-9])(?:_([a-z][0-9]))?
A simple alternative without using regex:
s = '#q7'
print s[1:].split('_')
# ['q7']
s = '#q1_a1'
print s[1:].split('_')
# ['q1', 'a1']
This is assuming all of your strings start with #
. If that's not the case, then you could easily do some validation:
s = '#q1_a1'
if s.startswith('#'):
print s[1:].split('_')
# ['q1', 'a1]
s = 'q1_a1'
if s.startswith('#'):
print s[1:].split('_') # Nothing is printed
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.