简体   繁体   中英

Python Regex - Match multiple expression with groups

I have a string:

property1=1234, property2=102.201.333, property3=abc

I want to capture 1234 and 102.201.333. I am trying to use the regex:

property1=([^,]*)|property2=([^,]*)

But it only manages to capture one of the values. Based on this link I also tried:

((?:property1=([^,]*)|property2=([^,])+)
(?:(property1=([^,]*)|property2=([^,])+)

They capture an extra group from somewhere I can't figure.

What am I missing?

PS I am using re.search().

Edit: There may be something wrong in my calling code:

m = re.search('property1=([^,]*)|property2=([^,]*)', text);
print m.groups()

Edit2: It doesn't have to be propertyX. It can be anything:

foo1=123, bar=101.2.3, foobar=abc

even

foo1=123, bar=weirdbar[345], foobar=abc

As an alternative, we could use some string splitting to create a dictionary.

text = "property1=1234, property2=102.201.333, property3=abc"
data = dict(p.split('=') for p in text.split(', '))
print data["property2"] # '102.201.333'

Regular expressions are great for things that act like lexemes , not so good for general purpose parsing.

In this case, though, it looks like your "configuration-y string" may consist solely of a sequence of lexemes of the form: word = value [ , word = value ... ]. If so, you can use a regexp and repetition. The right regexp depends on the exact form of word and value , though (and to a lesser extent, whether you want to check for errors). For instance, is:

this="a string with spaces", that = 42, quote mark = "

allowed, or not? If so, is this set to a string with spaces (no quotes) or "a string with spaces" (includes quotes)? Is that set to 42 (which has a leading blank) or just 42 (which does not)? Is quote mark (which has embedded spaces) allowed, and is it set to one double-quote mark? Do double quotes, if present, "escape" commas, so that you can write:

greeting="Hello, world."

Assuming spaces are forbidden, and the word and value parts are simply "alphanumerics as matched by \\w ":

for word, value in re.findall(r'([\w]+)=([\w]+)', string):
    print word, value

It's clear from the 102.201.333 value that \\w is not sufficient for the value match, though. If value is "everything not a comma" (which includes whitespace), then:

for word, value in re.findall(r'([\w]+)=([^,]+)', string):
    print word, value

gets closer. These all ignore "junk" and disallow spaces around the = sign. If string is "$a=this, b = that, c=102.201.333,," , the second for loop prints:

a this
c 102.201.333

The dollar-sign (not an alphanumeric character) is ignored, the value for b is ignored due to white-space, and the two commas after the value for c are also ignored.

You're using a | . That means your regex will match either the thing on the left of the bar, or the thing on the right.

you could try:

property_regex = re.compile('property[0-9]+=(?P<property_value>[^\s]+)')

that would match any property after the equals sign and before a space. It would be accessible from the name property_value just like the documentation says:

copied from python re documentation

For example, if the pattern is (?P[a-zA-Z_]\\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \\g).

尝试这个:

property_regex = re.compile('property[0-9]+=([^\s]+)')

I have tried building a regular expression for you which will give you the values after property1= and property2 but I am not sure how you use them in Python.

Edit

now captures other stuff apart from property before the '=' sign.

This is my original regular expression which does capture the value.

(?<=[\\w]=).*?[^,]+

and this is a variation of the above, IMO what I believe you would need to use in Python

/(?<=[\w]=).*?[^,]+/g

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM