如何使用python从一行xml解析文本

Question

I have a single line of xml and would like to parse all text parts into a list of text. 我只有一行xml，想将所有文本部分解析为文本列表。

text = '<string name="status">Finishing <xliff:g id="number">%d</xliff:g> percent.</string>'

My desired output: 我想要的输出：

desired_output = ['Finishing', '%d', 'percent.']

I used regular expression for this simple task. 我为这个简单的任务使用了正则表达式。

import re
pattern = re.compile(r'>.+<')
match = re.findall(pattern, text)

match = ['>Finishing <xliff:g id="number">%d</xliff:g> percent.<']

It seems regular expression failed to get my desired output. 似乎正则表达式无法获得我想要的输出。

Answer 1

I don't know Python well, but I do know that parsing XML with regular expressions is setting yourself up for a world of pain . 我不太了解Python，但是我确实知道，使用正则表达式解析XML会让您为之痛苦。 Try something like this using ElementTree instead, tested in Python 2.7: 尝试使用在Python 2.7中测试过的ElementTree来尝试类似的事情：

import xml.etree.cElementTree as ElementTree
xml_text='<string name="status">Finishing <xliff:g id="number">%d</xliff:g> percent.</string>'
xml=ElementTree.fromstring('<data xmlns:xliff="foo">' + xml_text + '</data>')
print ElementTree.tostring(xml, method='text')

Output: 输出：

>>> Finishing %d percent.

Note because there's a namespace in the XML, it needed a wrapper placed around the text. 注意，因为XML中有一个命名空间，所以需要在文本周围放置一个包装器。 If your actual XML already has the namespace declared, it can be skipped. 如果您的实际XML已经声明了名称空间，则可以跳过它。

Answer 2

update your regex to this 将您的正则表达式更新为此

 pattern = re.compile(r'. *?>(.+?)<')

if you are working with xml/html parsing you might consider using Beautifulsoup ,it will save you a great deal of time to write more regex but if you want to learn regex then it will be by trial and error 如果您正在使用xml / html解析，则可以考虑使用Beautifulsoup ，它将为您节省大量时间来编写更多正则表达式，但是如果您想学习正则表达式，则需要反复尝试

如何使用python从一行xml解析文本

问题描述

2 个解决方案

解决方案1
0 2017-04-06 04:16:10

解决方案2
-1 已采纳 2017-04-06 03:55:47

如何使用python从一行xml解析文本

问题描述

2 个解决方案

解决方案1 0 2017-04-06 04:16:10

解决方案2 -1 已采纳 2017-04-06 03:55:47

解决方案1
0 2017-04-06 04:16:10

解决方案2
-1 已采纳 2017-04-06 03:55:47