简体   繁体   中英

Regular expression to transform string to list (Python)

A program outputs a file with lines of the following format

{Foo} Bar Bacon {Egg}

where Foo and Egg could, but do not have to, be made up of several words. Bar and Bacon always are a single word.

I need to get Bar in a variable for my further code. I imagine that this would work if I split the sting at a matching regular expression. This would return a list of the four elements and thus I could easily get out the second element with list[1] .

How would I write such a regular expression?

I need to split the sting on single spaces ' ' , but only if that single space is not surrounded by text in curly braces.

\\s(?=[a-zA-Z{}]) gives me all the spaces and thus behaves exactly like ' ' . How can I exclude the spaces in the curly braces?

This might help.

>>> import re
>>> line = '{Foo} Bar Bacon {Egg}'
>>> m = re.search(r'}\s+(\S+)\s+', line)
>>> m.group(1)
'Bar'
>>> 

I just searched for any word that follows a close-brace. I used () to group that word so that I could access it later with m.group()

If you really want all four elements, try re.findall() :

>>> line = '{Foo Goo} Bar Bacon {Egg Foo}'
>>> re.findall(r'{.*?}|\S+', line)
['{Foo Goo}', 'Bar', 'Bacon', '{Egg Foo}']

You can try {[^}]*}\\s(\\w+)

>>> import re
>>> print re.search(r'{[^}]*}\s(\w+)', '{Foo} Bar Bacon {Egg}').group(1)
Bar

Demo

Explanation:

  • {[^}]*} first you match the first section inside curly braces
  • \\s then a whitespace
  • (\\w+) then the second section; you put it in a capturing group, so it's available in search results as group(1)

re.search(pattern, string, flags=0)

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

https://docs.python.org/3/library/re.html#re.search

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM