[英]how to search for partial sub-string or regex in XPATH in a xml file using python
我试图在xml文件内容中搜索正则表达式模式,并发现有关如何传递始终以数字结尾的子字符串的问题(这是xml文件中动态的一部分,因此,不知道如何创建模式并进行搜索)。
一旦找到模式,则需要获取它的子标记项,即attrib和文本值。
xml文件内容:
<author NAME="PYTHON_DD101">
<type>BOOK</type>
<ID>59</ID>
<inst ID="A">Garry</inst>
<inst ID="B">Gerald</inst>
</author>
<author NAME="PYTHON_ABC4">
<type>BOOK</type>
<SrcID>62</SrcID>
<inst ID="A">Niel</inst>
<inst ID="B">Long</inst>
</author>
代码:
text = "PYTHON"
tmp = '"' + text + "_ABC" + '"'
print(tmp)
#pattern = re.compile('%s\d+'%tmp)
endsWithNumber = re.compile('%s\d$'%tmp)
print(endsWithNumber)
#FoundDetails = Content.find("PYTHON_ABC4")
FoundDetails = Content.find(".//author[@NAME='{}']".format(endsWithNumber))
#regex = re.compile('%s\d+'%tmp)
#matches = regex.match(Content)
#print(matches)
print(type(Content))
print(type(FoundDetails))
print(FoundDetails)
for FoundDetails in FoundDetails.iterfind('author'):
author = FoundDetails.attrib['NAME']
print 'author:', author
for inst in FoundDetails.iterfind('inst'):
print 'inst id:', inst.attrib['ID'], 'inst name:', inst.text
错误越来越:
PYTHON_ABC
<_sre.SRE_Pattern object at 0x000000000403F168>
<class 'xml.etree.ElementTree.Element'>
<type 'NoneType'>
None
Traceback (most recent call last):
File "C:\test_Book.py", line 45, in <module>
bookauthor = book.get_Book_by_author(Book)
File "C:\Book.py", line 219, in get_Book_by_author
for FoundDetails in FoundDetails.iterfind('author'):
AttributeError: 'NoneType' object has no attribute 'iterfind'
预期产量:
inst id: A inst name: Niel
inst id: B inst name: Long
如果我在下面的行中传递了确切的NAME值,即“ PYTHON_ABC4”,它可以工作,但是我不想传递硬编码值,因为文件中可能还有其他实例,因此有可能使用具有相同模式的名字ex: PYTHON_ABC12”,我也想获取这些书的详细信息。
FoundDetails = Content.find(".//author[@NAME='{}']".format("PYTHON_ABC4"))
我稍微修改了您的代码,以获得所需的输出,希望对您有所帮助
data='''
<PARAMETER-VALUES>
<author NAME="PYTHON_DD11">
<type>BOOK</type>
<ID>59</ID>
<inst ID="A">Garry</inst>
<inst ID="B">Gerald</inst>
</author>
<author NAME="PYTHON_ABC4">
<type>BOOK</type>
<SrcID>62</SrcID>
<inst ID="A">Niel</inst>
<inst ID="B">Long</inst>
</author>
</PARAMETER-VALUES>
'''
#Element tree to parse the xml data
import xml.etree.ElementTree as ET
import re
root=ET.fromstring(data)
# A function to verify if the node is alphanumeric
def hasnumbers(result):
return bool(re.search(r'\d', result))
for author in root.iter('author'):
result=author.attrib.get('NAME')
b=hasnumbers(result)
if b==True:
for inst in author.iterfind('inst'):
print 'inst id:',inst.attrib.get('ID'),'inst name:',inst.text
产量
inst id: A inst name: Garry
inst id: B inst name: Gerald
inst id: A inst name: Niel
inst id: B inst name: Long
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.