[英]Parsing all elements in XML File to CSV without hardcoding values
我想知道是否有一種方法可以解析下面的 XML 並獲取大部分標簽,包括嵌套的標簽,並將它們放入列和行中而無需硬編碼。
<?xml version="1.0" encoding="UTF-8"?>
<faults version="1" xmlns="urn:nortel:namespaces:mcp:faults" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:faults NortelFaultSchema.xsd ">
<family longName="1OffMsgr" shortName="OOM"/>
<family longName="ACTAGENT" shortName="ACAT">
<logs>
<log>
<eventType>RES</eventType>
<number>1</number>
<severity>INFO</severity>
<descTemplate>
<msg>Accounting is enabled upon this NE.</msg>
</descTemplate>
<note>This log is generated when setting a Session Manager's AM from <none> to a valid AM.</note>
<om>On all instances of this Session Manager, the <NE_Inst>:<AM>:STD:acct OM row in the StdRecordStream group will appear and start counting the recording units sent to the configured AM.
On the configured AM, the <NE_inst>:acct OM rows in RECSTRMCOLL group will appear and start counting the recording units received from this Session Manager's instances.
</om>
</log>
<log>
<eventType>RES</eventType>
<number>2</number>
<severity>ALERT</severity>
<descTemplate>
<msg>Accounting is disabled upon this NE.</msg>
</descTemplate>
<note>This log is generated when setting a Session Manager's AM from a valid AM to <none>.</note>
<action>If you do not intend for the Session Manager to produce accounting records, then no action is required. If you do intend for the Session Manager to produce accounting records, then you should set the Session Manager's AM to a valid AM.</action>
<om>On all instances of this Session Manager, the <NE_Inst>:<AM>:STD:acct OM row in the StdRecordStream group that matched the previous datafilled AM will disappear.
On the previously configured AM, the <NE_inst>:acct OM rows in RECSTRMCOLL group will disappear.
</om>
</log>
</logs>
</family>
<family longName="ACODE" shortName="AC">
<alarms>
<alarm>
<eventType>ADMIN</eventType>
<number>1</number>
<probableCause>INFORMATION_MODIFICATION_DETECTED</probableCause>
<descTemplate>
<msg>Configured data for audiocode server updated: $1</msg>
<param>
<num>1</num>
<description>AudioCode configuration data got updated</description>
<exampleValue>acgwy1</exampleValue>
</param>
</descTemplate>
<manualClearable></manualClearable>
<correctiveAction>None. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
<alarmName>Audiocode Server Updated</alarmName>
<severities>
<severity>MINOR</severity>
</severities>
</alarm>
<alarm>
<eventType>ADMIN</eventType>
<number>2</number>
<probableCause>CONFIG_OR_CUSTOMIZATION_ERROR</probableCause>
<descTemplate>
<msg>Deployment for audiocode server failed: $1. Reason: $2.</msg>
<param>
<num>1</num>
<description>AudioCode Name</description>
<exampleValue>audcod</exampleValue>
</param>
<param>
<num>2</num>
<description>AudioCode Deployment failed reason</description>
<exampleValue>Failed to parse audiocode configuration data</exampleValue>
</param>
</descTemplate>
<manualClearable></manualClearable>
<correctiveAction>Check the configuration of audiocode server. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
<alarmName>Audiocode Server Deploy Failed</alarmName>
<severities>
<severity>MINOR</severity>
<severity>MAJOR</severity>
</severities>
</alarm>
<alarm>
<eventType>COMM</eventType>
<number>2</number>
<probableCause>LOSS_OF_FRAME</probableCause>
<descTemplate>
<msg>Far end LOF (a.k.a., Yellow Alarm). Trunk (DS1 Number): $1.</msg>
<param>
<num>1</num>
<description>Trunk Number of Trunk with configuration problem</description>
<exampleValue>2</exampleValue>
</param>
</descTemplate>
<clearCondition>Far end is correctly configured for proper framing.</clearCondition>
<correctiveAction>Check that the far end is configured for the proper framing.</correctiveAction>
<alarmName>Far end LOF</alarmName>
<severities>
<severity>CRITICAL</severity>
</severities>
<note>This alarm indicates the Trunk Framing settings on the connected PSTN switch do not match those provisioned on the Audiocodes Mediant 2k.</note>
</alarm>
<alarm>
<eventType>COMM</eventType>
<number>3</number>
<probableCause>LOSS_OF_FRAME</probableCause>
<descTemplate>
<msg>Near end sending LOF Indication. Trunk (DS1 Number): $1.</msg>
<param>
<num>1</num>
<description>Trunk Number of Trunk with configuration problem</description>
<exampleValue>2</exampleValue>
</param>
</descTemplate>
<clearCondition>Gateway is correctly configured for proper framing.</clearCondition>
<correctiveAction>Check that the Audiocodes gateway is configured for the proper framing.</correctiveAction>
<alarmName>Near end sending LOF Indication</alarmName>
<severities>
<severity>CRITICAL</severity>
</severities>
</alarm>
</alarms>
<logs>
<log>
<eventType>ABNORMAL</eventType>
<number>1</number>
<severity>ALERT</severity>
<descTemplate>
<msg>Failed to deploy audiocode server. Server Name: $1, Failed At: $2</msg>
<param>
<num>1</num>
<description>IP address of gateway which failed.</description>
<exampleValue>192.168.0.1</exampleValue>
</param>
<param>
<num>2</num>
<description>One of the following: "Parse Configuration Data","Upload Tone File","Upload Load File" and "Upload Configuration File"</description>
<exampleValue>Parse Configuration Data</exampleValue>
</param>
</descTemplate>
<note>There was a problem during the commissioning/upgrade of a gateway. Either the configuration file was corrupt or files could not be uploaded to the gateway.</note>
<action>Examine the MCS logs as well as the syslogs from the gateway to determine what is causing the problem.</action>
</log>
<log>
<eventType>ABNORMAL</eventType>
<number>2</number>
<severity>ALERT</severity>
<descTemplate>
<msg>Failed to restart audiocode server. Server Name: $1. Exception caught: $2</msg>
<param>
<num>1</num>
<description>Server Long Name</description>
<exampleValue>audiocode_gateway_1</exampleValue>
</param>
<param>
<num>2</num>
<description>Exception occured during restarting the server.</description>
<exampleValue>[example Java exception traceback not given]</exampleValue>
</param>
</descTemplate>
<note>The AudioCodes Gateway was unable to be restarted due to a problem found in the INI file.</note>
<action>Examine the configuration file and the syslogs of the gateway to determine what the configuration error is. Correct this, then restart the server.</action>
</log>
</logs>
</family>
</faults>
代碼基本上是這樣做的,但它沒有獲取 descTemplate 標記內的嵌套元素。 我想找到一個有效的解決方案來解析所有元素,包括沒有硬編碼(或盡可能少)的嵌套元素。
進一步詳細說明程序的作用:例如,如果我們查看我的 xml 中的 eventType 標記。 它創建一個名為“eventType”的列,並將值放在該列的下方。 它解析的每個“eventType”標簽都將被放入同一列。
在之前的一個非常相似的問題中,tdelaney 慷慨地提供了這段代碼,我還沒有想出如何擴展來解決我的問題,所以我想我會再問一次 - 謝謝 tdelaney:
import csv
import lxml.etree
from lxml.etree import QName
import operator
class ExpandingTable:
"""A 2 dimensional table where columns are exapanded as new column
types are discovered"""
def __init__(self):
"""Create table that can expand rows and columns"""
self.name_to_col = {}
self.table = []
def add_column(self, name):
"""Add column named `name` unless already included"""
if name not in self.name_to_col:
self.name_to_col[name] = len(self.name_to_col)
for row in self.table:
row.append('')
def add_cell(self, name, value):
"""Add value to named column in the current row"""
if value:
self.add_column(name)
self.table[-1][self.name_to_col[name]] = value.strip().replace("\r\n", " ")
def new_row(self):
"""Create a new row and make it current"""
self.table.append([''] * len(self.name_to_col))
def header(self):
"""Gather discovered column names into a header list"""
idx_1 = operator.itemgetter(1)
return [name for name, _ in sorted(self.name_to_col.items(), key=idx_1)]
def prepend_header(self):
"""Gather discovered column names into a header and
prepend it to the list"""
self.table.insert(0, self.header())
def events_to_table(elem):
""" Builds table from <family> child elements and their contained alarms and
logs."""
ns = {"f":"urn:nortel:namespaces:mcp:faults"}
table = ExpandingTable()
for family in elem.xpath("f:family", namespaces=ns):
longName = family.get("longName")
shortName = family.get("shortName")
for event in family.xpath("*/*[f:eventType]", namespaces=ns):
table.new_row()
table.add_cell("longName", longName)
table.add_cell("shortName", shortName)
for cell in event:
tag = QName(cell.tag).localname
if tag == "severities":
tag = "severity"
text = ",".join(severity.text for severity in cell.xpath("*"))
print("severities", repr(text))
else:
text = cell.text
table.add_cell(tag, text)
table.prepend_header()
return table.table
def main(filename):
doc = lxml.etree.parse(filename)
table = events_to_table(doc.getroot())
with open('test.csv', 'w', newline='', encoding='utf-8') as fileobj:
csv.writer(fileobj).writerows(table)
main('OMGroups.xml')
任何幫助將不勝感激。
嘗試這個。
from simplified_scrapy import SimplifiedDoc, utils
def getKeyValues(nodeCols, dic, header):
for nodeCol in nodeCols:
childCols = nodeCol.children
if childCols:
getKeyValues(childCols, dic, header)
else:
tag = nodeCol.tag
v = dic.get(tag)
if v: # Cases with multiple values
dic[tag] = v + '|' + nodeCol.text # Splicing into 1 column
# i = 1
# while True:
# tag = tag + str(i)
# v = dic.get(tag)
# if v == None:
# dic[tag] = nodeCol.text
# break
# i = i + 1
else:
dic[tag] = nodeCol.text
if tag not in header:
header.append(tag)
xml = utils.getFileContent('OMGroups.xml')
doc = SimplifiedDoc(xml) # create doc
header = ['longName','shortName','nodeType'] # add column
dicRow = []
# nodes = doc.faults.children.child
parentNodes = doc.faults.children.children # add
for nodes in parentNodes: # add
for node in nodes: # logs,alarms...
if not node:
continue
family = node.parent
longName = family['longName'] # get the value
shortName = family['shortName']
nodeRows = node.children
for nodeRow in nodeRows: # log,log...
dicCol = {'longName': longName, 'shortName': shortName, 'nodeType': nodeRow.tag}
nodeCols = nodeRow.children # eventType,number
getKeyValues(nodeCols, dicCol, header)
dicRow.append(dicCol)
# Prepare the data and store it in the csv file
rows = [header]
for dic in dicRow:
rows.append([dic.get(k) for k in header])
utils.save2csv('test.csv', rows, newline='')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.