I'm trying to get data from a pipe in Python. The data is structured like this:
<item><type> data </type><code> data </code><length> data </length><data encoding=“base64”> data </data></item>
How do I get the data inbetween these tags? I've already written a Base64 decoder.
One way is to use the lxml package and treat the raw data as a html
from lxml import html
raw_data = '<item><type> data </type><code> data </code><length> data </length><data encoding=“base64”> data </data></item>'
html_data = html.fromstring(raw_data)
data = html_data.xpath('//text()')
# data = [' data ', ' data ', ' data ', ' data ']
that may be a bit overkill, another way is using regular expression
pattern copied from https://kevin.deldycke.com/2008/07/python-ultimate-regular-expression-to-catch-html-tags/
import re
def get_data(text):
pattern = "(?i)<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>"
return re.sub(p, '', text).split()
text = '<item><type> data </type><code> data </code><length> data </length><data encoding=“base64”> data </data></item>'
print(get_data(text))
# ['data', 'data', 'data', 'data']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.