A program outputs a file with lines of the following format
{Foo} Bar Bacon {Egg}
where Foo
and Egg
could, but do not have to, be made up of several words. Bar
and Bacon
always are a single word.
I need to get Bar
in a variable for my further code. I imagine that this would work if I split the sting at a matching regular expression. This would return a list of the four elements and thus I could easily get out the second element with list[1]
.
How would I write such a regular expression?
I need to split the sting on single spaces ' '
, but only if that single space is not surrounded by text in curly braces.
\\s(?=[a-zA-Z{}])
gives me all the spaces and thus behaves exactly like ' '
. How can I exclude the spaces in the curly braces?
This might help.
>>> import re
>>> line = '{Foo} Bar Bacon {Egg}'
>>> m = re.search(r'}\s+(\S+)\s+', line)
>>> m.group(1)
'Bar'
>>>
I just searched for any word that follows a close-brace. I used ()
to group that word so that I could access it later with m.group()
If you really want all four elements, try re.findall()
:
>>> line = '{Foo Goo} Bar Bacon {Egg Foo}'
>>> re.findall(r'{.*?}|\S+', line)
['{Foo Goo}', 'Bar', 'Bacon', '{Egg Foo}']
You can try {[^}]*}\\s(\\w+)
>>> import re
>>> print re.search(r'{[^}]*}\s(\w+)', '{Foo} Bar Bacon {Egg}').group(1)
Bar
Explanation:
{[^}]*}
first you match the first section inside curly braces \\s
then a whitespace (\\w+)
then the second section; you put it in a capturing group, so it's available in search results as group(1)
re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.