简体   繁体   English

Python Regex:这有什么问题?

[英]Python Regex : What's wrong with this?

I am trying to do a regex to just get the error code from this XML. 我试图做一个正则表达式只是从此XML获取错误代码。

>>> re_code = re.compile(r'<errorcode>([0-9]+)</errorcode>', re.MULTILINE)
>>> re_code.match('''<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
... <methoderesponse>
...     <status>
...         <message/>
...         <errorcode>515</errorcode>
...         <value>ERROR</value>
...     </status>
... </methoderesponse>
... ''')

It should be quite easy. 应该很容易。 But I don't understand why it doesn't match. 但是我不明白为什么它不匹配。

.match() attempts to match at the start. .match()尝试在开始时进行匹配。 You want .search() or more likely .findall() 你想.search()或更可能.findall()

Have a look at an XML parser though - much nicer to use XPath or equivalent to get your data (plus it'll handle nuances that regex's won't) 不过,请看一下XML解析器-使用XPath或等效工具来获取数据要好得多(而且它将处理正则表达式所不能提供的细微差别)

An example that works with your sample XML: 与示例XML一起使用的示例:

import xml.etree.ElementTree as ET
tree = ET.fromstring(text)

>>> tree.findall('.//errorcode')[0].text
'515'

More info about ElementTree here and I would personally check out lxml 有关ElementTree的更多信息,我将亲自检查lxml

as @ Jon Clements has said, .match() only works if the expression is supposed to run from the beginning of the string, .search() searches the string for the first occurrence, and .findall() searches for all the occurrences. 如@ 乔恩克莱门茨说, .match()如果表达式应该从字符串的开头只运行工作, .search()搜索第一次出现的字符串, .findall()对所有出现的搜索。

but regardless of that, you should modify slightly your regular expression to a slightly more readable version: 但是无论如何,您都应该对正则表达式进行一些修改,使其更具可读性:

regex = re.compile(r'<errorcode>(\d+)</errorcode>')

you don't need the re.MULTILINE argument, it does not pertain to this problem. 您不需要re.MULTILINE参数,它与这个问题无关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM