简体   繁体   English

python正则表达式分组

[英]python regular expression grouping

My regular expression goal: 我的正则表达式目标:

"If the sentence has a '#' in it, group all the stuff to the left of the '#' and group all the stuff to the right of the '#'. If the character doesn't have a '#', then just return the entire sentence as one group" “如果句子中包含'#',则将所有内容分组到'#'的左侧,并将所有内容分组在'#'的右侧。如果字符没有'#',然后将整个句子归为一组”

Examples of the two cases: 两种情况的示例:

A) '120x4#Words' -> ('120x4', 'Words')
B) '120x4@9.5' -> ('120x4@9.5')

I made a regular expression that parses case A correctly 我做了一个可以正确解析情况A的正则表达式

(.*)(?:#(.*))

# List the groups found
>>> r.groups()
(u'120x4', u'words')

But of course this won't work for case B -- I need to make "# and everything to the right of it" optional 但是,这当然不适用于情况B-我需要将“#及其右边的所有内容”设置为可选

So I tried to use the '?' 因此我尝试使用“?” "zero or none" operator on that second grouping to indicate it's optional. 第二个分组上的“零或无”运算符表示它是可选的。
(.*)(?:#(.*))?

But it gives me bad results. 但这给我不好的结果。 The first grouping eats up the entire string. 第一组吃掉了整个串。

# List the groups found
>>> r.groups()
(u'120x4#words', None)

Guess I'm either misunderstanding the none-or-one '?' 猜猜我是不是误解了一个或一个'?' operator and how it works on groupings or I am misunderstanding how the first group is acting greedy and grabbing the entire string. 运算符及其在分组上的工作方式,或者我误解了第一组如何表现贪婪并抓取整个字符串。 I did try to make the first group 'reluctant', but that gave me a total no-match. 我确实尝试过让第一组“不愿”,但这给了我一个完全不匹配的机会。

(.*?)(?:#(.*))?


# List the groups found
>>> r.groups()
(u'', None)

Simply use the standard str.split function: 只需使用标准的str.split函数:

s = '120x4#Words'
x = s.split( '#' )

If you still want a regex solution, use the following pattern: 如果仍然需要正则表达式解决方案,请使用以下模式:

([^#]+)(?:#(.*))?

use re.split : 使用re.split

>>> import re
>>> a='120x4#Words'
>>> re.split('#',a)
['120x4', 'Words']
>>> b='120x4@9.5'
>>> re.split('#',b)
['120x4@9.5']
>>> 
(.*?)#(.*)|(.+)

this sjould work.See demo. 这应该工作。请参阅演示。

http://regex101.com/r/oC3nN4/14 http://regex101.com/r/oC3nN4/14

Here's a verbose re solution. 这里有一个详细的re解。 But, you're better off using str.split . 但是,最好使用str.split

import re

REGEX = re.compile(r'''
    \A
    (?P<left>.*?)
    (?:
        [#]
        (?P<right>.*)
    )?
    \Z
''', re.VERBOSE)


def parse(text):
    match = REGEX.match(text)
    if match:
        return tuple(filter(None, match.groups()))

print(parse('120x4#Words'))
print(parse('120x4@9.5'))

Better solution 更好的解决方案

def parse(text):
    return text.split('#', maxsplit=1)

print(parse('120x4#Words'))
print(parse('120x4@9.5'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM