简体   繁体   中英

Extract string using regex

How can I extract the content ( how are you ) from the string:

<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">how are you</string>. 

Can I use regex for the purpose? if possible whats suitable regex for it.

Note: I dont want to use split function for extract the result. Also can you suggest some links to learn regex for a beginner.

I am using python2.7.2

You could use a regular expression for this ( as Joey demonstrates ).

However if your XML document is any bigger than this one-liner you could not since XML is not a regular language .

Use BeautifulSoup (or another XML parser ) instead:

>>> from BeautifulSoup import BeautifulSoup
>>> xml_as_str = '<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">how are you</string>. '
>>> soup = BeautifulSoup(xml_as_str)
>>> print soup.text
how are you.

Or...

>>> for string_tag in soup.findAll('string'):
...     print string_tag.text
... 
how are you
(?<=<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">)[^<]+(?=</string>)

would match what you want, as a trivial example.

(?<=<)[^<]+

would, too. It all depends a bit on how your input is formatted exactly.

尝试使用以下正则表达式:

/<[^>]*>(.*?)</

This will match a generic HTML tag (Replace "string" with the tag you want to match):

/<string[^<]*>(.*?)<\/string>/i

(i=case insensitive)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM