简体   繁体   English

打开多个xml文件,解析

[英]Open multiple xml files, and parse them

I need your help.我需要你的帮助。 I'm trying to read many xlm files from just one folder, and I need to extract some information of each xml. These xml have the same structure.我试图从一个文件夹中读取许多 xlm 文件,我需要提取每个 xml 的一些信息。这些 xml 具有相同的结构。

At this point I can read each XML file, but just capture the information of the last one opened.此时我可以读取每个 XML 个文件,但只是捕获最后一个打开的信息。 How can I capture the information of each xml file and saved into a dataframe structure with pandas?如何捕获每个883812976388文件的信息并保存到pandas的dataframe结构中?

This is my code:这是我的代码:

from os import listdir, path
import xml.etree.ElementTree as ET

mypath = '/Users/nicolasdiaz/Desktop/dtes copy'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files:
    print(file)
    tree = ET.parse(file)
    root = tree.getroot()

for docID in root.iter('Folio'):
    Invoice = 'Factura:' + docID.text
    print(Invoice)
for client_rut in root.iter('RUTRecep'):
    Rut = 'Rut:' + client_rut.text
    print(Rut)

And this is my result: , but I need the information of the three xml files这是我的结果: ,但我需要三个 xml 文件的信息

/Users/nicolasdiaz/venv/bin/python 
"/Users/nicolasdiaz/PycharmProjects/Marfil/lib/python3.10/Open files.py"
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1877.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1960.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1961.xml
Factura:1961
Rut:93770000-8

Process finished with exit code 0
  1. Move the two bottom for loops into the above one, like this:将底部的两个 for 循环移动到上面的循环中,如下所示:

    from os import listdir, path import xml.etree.ElementTree as ET从 os 导入 listdir,路径导入 xml.etree.ElementTree 作为 ET

mypath = '/Users/nicolasdiaz/Desktop/dtes copy' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')] mypath = '/Users/nicolasdiaz/Desktop/dtes copy' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files: print(file) tree = ET.parse(file) root = tree.getroot()对于文件中的文件:print(file) tree = ET.parse(file) root = tree.getroot()

for docID in root.iter('Folio'):
    Invoice = 'Factura:' + docID.text
    print(Invoice)
for client_rut in root.iter('RUTRecep'):
    Rut = 'Rut:' + client_rut.text
    print(Rut)
  1. Create a dataframe before the for statement and in the loop, append to it using:在 for 语句之前和循环中创建一个 dataframe,append 使用:

    df.append([file, Invoice, Rut]) df.append([文件,发票,车辙])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM