简体   繁体   中英

python regular expression grouping

My regular expression goal:

"If the sentence has a '#' in it, group all the stuff to the left of the '#' and group all the stuff to the right of the '#'. If the character doesn't have a '#', then just return the entire sentence as one group"

Examples of the two cases:

A) '120x4#Words' -> ('120x4', 'Words')
B) '120x4@9.5' -> ('120x4@9.5')

I made a regular expression that parses case A correctly

(.*)(?:#(.*))

# List the groups found
>>> r.groups()
(u'120x4', u'words')

But of course this won't work for case B -- I need to make "# and everything to the right of it" optional

So I tried to use the '?' "zero or none" operator on that second grouping to indicate it's optional.
(.*)(?:#(.*))?

But it gives me bad results. The first grouping eats up the entire string.

# List the groups found
>>> r.groups()
(u'120x4#words', None)

Guess I'm either misunderstanding the none-or-one '?' operator and how it works on groupings or I am misunderstanding how the first group is acting greedy and grabbing the entire string. I did try to make the first group 'reluctant', but that gave me a total no-match.

(.*?)(?:#(.*))?


# List the groups found
>>> r.groups()
(u'', None)

Simply use the standard str.split function:

s = '120x4#Words'
x = s.split( '#' )

If you still want a regex solution, use the following pattern:

([^#]+)(?:#(.*))?

use re.split :

>>> import re
>>> a='120x4#Words'
>>> re.split('#',a)
['120x4', 'Words']
>>> b='120x4@9.5'
>>> re.split('#',b)
['120x4@9.5']
>>> 
(.*?)#(.*)|(.+)

this sjould work.See demo.

http://regex101.com/r/oC3nN4/14

Here's a verbose re solution. But, you're better off using str.split .

import re

REGEX = re.compile(r'''
    \A
    (?P<left>.*?)
    (?:
        [#]
        (?P<right>.*)
    )?
    \Z
''', re.VERBOSE)


def parse(text):
    match = REGEX.match(text)
    if match:
        return tuple(filter(None, match.groups()))

print(parse('120x4#Words'))
print(parse('120x4@9.5'))

Better solution

def parse(text):
    return text.split('#', maxsplit=1)

print(parse('120x4#Words'))
print(parse('120x4@9.5'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM