[英]Python XML Parsing With Minidom Using Exception Handling
I am in the process of stripping a couple million XMLs of sensitive data. 我正在剥离几百万个敏感数据的XML。 How can I add a try and except to get around this error which seems to have occurred because a couple of malformed xmls out to the bunch. 我该如何添加尝试,但要解决这个错误,因为几个错误的xmls出现了,这似乎已经发生了。
xml.parsers.expat.ExpatError: mismatched tag: line 1, column 28691 xml.parsers.expat.ExpatError:标记不匹配:第1行,第28691列
#!/usr/bin/python
import sys
from xml.dom import minidom
def getCleanString(word):
str = ""
dummy = 0
for character in word:
try:
character = character.encode('utf-8')
str = str + character
except:
dummy += 1
return str
def parsedelete(content):
dom = minidom.parseString(content)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()
for line in sys.stdin:
if line > 1:
line = line.strip()
line = line.split(',', 2)
if len(line) > 2:
partition = line[0]
id = line[1]
xml = line[2]
xml = getCleanString(xml)
xml = parsedelete(xml)
strng = '%s\t%s\t%s' %(partition, id, xml)
sys.stdout.write(strng + '\n')
Catching exceptions is straight forward. 捕获异常很简单。 Add import xml
to your import statements and wrap the problem code in a try/except handler. 将import xml
添加到您的import语句中,然后将问题代码包装在try / except处理程序中。
def parsedelete(content):
try:
dom = minidom.parseString(content)
except xml.parsers.expat.ExpatError, e:
# not sure how you want to handle the error... so just passing back as string
return str(e)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.