[英]Get string between two identifiers on multiple lines with a line by line read
[英]Read all lines between two string
我想从我的 xml 之间的 xml 中提取行。 下面是一个例子:
<userData code="viPartListRailML" value="1">
<partRailML s="0.0000000000000000e+00" id="0"/>
<partRailML s="2.0000000000000000e+01" id="1"/>
<partRailML s="9.4137883373059267e+01" id="2"/>
</userData>
这是我的代码,我正在尝试:
import re
shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
for x in shakes:
if "userData" in x:
print x
continue
if "/userData" in x:
break
问题是它仍然只返回包含<userData
或</userData>
如何修改它以获取这两个“单词”之间的行
假设您的文件中有一个<userData>
块,您可以通过以下方式提取块内的行:
shakes = open("./file.xml", "r")
inblock = False
for x in shakes:
if "/userData" in x:
inblock = False
if inblock:
print(x)
if "userData" in x:
inblock = True
但是使用 xml 解析器读取您的文件更健壮,例如:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
for data in tree.getroot().iter('userData'):
for child in data:
print(ET.tostring(child))
# or something else, eg:
# print(child.tag)
顺便说一句,尽可能使用 Python3,Python2 已退休。
简单的方法是添加一个变量,它告诉你是否在这些词之间:
shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
t=False
for x in shakes:
if t:
print(x) # also /userdata -line is printed
if "/userData" in x:
t=False
elif "userData" in x: # this matches /userData as well--> elif
t=True
您可以使用itertools.dropwhile
到达<userData
部分,然后使用itertools.takewhile
读取直到</userData
:
import itertools as it
result = it.takewhile(
lambda x: '</userData' not in x,
it.dropwhile(
lambda x: '<userData' not in x,
text.splitlines()
)
)
print('\n'.join(result))
如果你想跳过<userData
元素,你可以添加itertools.islice
:
result = it.takewhile(
lambda x: '</userData' not in x,
it.islice(it.dropwhile(
lambda x: '<userData' not in x,
text.splitlines()
), 1, None)
)
print('\n'.join(result))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.