简体   繁体   English

解析 python 中的 XML 文件

[英]Parse a XML file in python

**I have a XML file where i need to read it and save each column to an excel file. **我有一个 XML 文件,我需要在其中读取它并将每一列保存到 excel 文件中。 Can someone pls help.有人可以帮忙吗?

I have some lines after the declare statement, But I want to parse from the table1 till /table1 can someone help me?我在声明语句之后有几行,但我想从 table1 解析到 /table1 有人可以帮我吗?

**<?xml version="1.0" encoding="Metadata" ?>
    <DECLARE  lmsid ="asdhgh"
     ...........
    </table1 name ="employee table" name ="E1 Enterprises" refid ="201"
     <data id = "ABC" emp = "dt">
     <country id ="m1" name =dt1">
     <rank text> "data"</rank text>
     <rank textd> "direction"</rank textd>
     <reference>
     <ref id ="9900m" id1="1000" ref="URL">
     </reference>
     </country>
    <data id = "xyz" emp = "dt1">
    <country id ="m2" name =dt2">
    <rank text> "data1"</rank text>
    <rank textd> "direction1"</rank textd>
    <reference>
    <ref id ="9900m" id1="2000" ref="URL">
    </reference>
    </country>
    </data id>
    ....
    </table1>
    </table1 name ="Manager table" name ="E1 Enterprises" refid ="202"
    <data id = "ARZ" emp = "dt">
    <country id ="m1" name =dt1">
     <rank text> "data"</rank text>
     <rank textd> "direction"</rank textd>
     <reference>
     <ref id ="9900m" id1="1000" ref="URL">
     </reference>
     </country>
     <data id = "QNC" emp = "dt1">
     <country id ="m2" name =dt2">
     <rank text> "data1"</rank text>
     <rank textd> "direction1"</rank textd>
     <reference>
     <ref id ="9900m" id1="2000" ref="URL">
     </reference>
     </country>
     </data id>
      ....
     </table1>
...

Thanks Aarush **谢谢奥鲁什**

So I think you can just use BeautifulSoup to parse XML things.所以我认为你可以只使用 BeautifulSoup 来解析 XML 的东西。 I found this snippet of code online我在网上找到了这段代码

# Import BeautifulSoup
from bs4 import BeautifulSoup

content = []

# Read the XML file
with open("sample.xml", "r") as file:

    # Read each line in the file, readlines() returns a list of lines
    content = file.readlines()

    # Combine the lines in the list into a string
    content = "".join(content)
    soup = BeautifulSoup(content, "lxml")

    #Do things

BS4 can find the xml tags pretty easy. BS4 可以很容易地找到 xml 标签。 Its docs are extensive, but something like soup.find('data', id='xyz') if you were looking for that info.它的文档很广泛,但如果您正在寻找该信息,则类似于soup.find('data', id='xyz') Then just export to csv with pandas or csv module.然后只需使用 pandas 或 csv 模块导出到 csv 即可。

Not sure what you mean by save each column.不知道保存每一列是什么意思。 An XML has: - tag name - attributes - text XML 具有: - 标签名称 - 属性 - 文本

You can use the xml.dom.minidom module您可以使用 xml.dom.minidom 模块

>>> s = '<t><a name="1"></a><a name="2"></a></t>'
>>> x = xml.dom.minidom.parseString(s)
>>> a = x.getElementsByTagName("a")
>>> for i in a:
...     print i.getAttribute("name")
...     
1
2

You can also parse a.xml file.您还可以解析 a.xml 文件。 x = xml.dom.minidom.parse("c:\xmlFile.xml") x = xml.dom.minidom.parse("c:\xmlFile.xml")

See more detail in the documentation:x = https://docs.python.org/2/library/xml.dom.minidom.html在文档中查看更多详细信息:x = https://docs.python.org/2/library/xml.dom.minidom.html

Once you have the value you want to save into your excel you can run a SQL statement with pyodbc and the microsoft odbc driver(Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}): Once you have the value you want to save into your excel you can run a SQL statement with pyodbc and the microsoft odbc driver(Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}):

import pyodbc

connection = pyodbc.connect("Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; readonly=0; DBQ=C:\yourfileName.xlsx")

cursor = connection.cursor()
sql = "insert into [Sheet1$] (col1,col2) values (val1,val2)"
cursor.execute(sql)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM