簡體   English   中英

使用 Python 將 XML 轉換為 csv 文件

[英]Convert XML to csv file with Python

我正在嘗試將 xml 文件轉換為 csv。 該文件如下所示:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tns:FlxPtn xmlns:ts2c="http://interop.covea.fr/Covea-Flx-TypesS2C-009" xmlns:tns="http://interop.covea.fr/Covea-App-PolRntVie-024" xmlns:rf2c="http://interop.covea.fr/Covea-Referentiel" xmlns:cov="http://interop.covea.fr/Covea-FlxPtn-002" xmlns:fs2c="http://interop.covea.fr/Covea-Flx-EneFncS2C-007" xsi:schemaLocation="http://interop.covea.fr/Covea-Flx-TypesS2C-009 Covea-Flx-TypesS2C-009.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024_MMA_124.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024.xsd http://interop.covea.fr/Covea-Referentiel Covea-Referentiel.xsd http://interop.covea.fr/Covea-FlxPtn-002 Covea-FlxPtn-002.xsd http://interop.covea.fr/Covea-Flx-EneFncS2C-007 Covea-Flx-EneFncS2C-007.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cov:DonEneTch>
    <cov:IdVrsEne>002</cov:IdVrsEne>
    <cov:IdFlx>1V400220191231VIGERIRVESMMA1</cov:IdFlx>
    <cov:TsCraFlx>2020-01-13 10.02.13.000000</cov:TsCraFlx>
    <cov:IdEmtFlx>MMA</cov:IdEmtFlx>
    <cov:IdRctFlx>MMA</cov:IdRctFlx>
    <cov:TyFlx>NOTIF</cov:TyFlx>
    <cov:TyTrtFlx>1</cov:TyTrtFlx>
    <cov:AquFlx>0</cov:AquFlx>
    <cov:EvrExu>PRODUCTION</cov:EvrExu>
    <cov:IdApnEmt>MMA</cov:IdApnEmt>
    <cov:AcnApl>VIGERIRVES</cov:AcnApl>
    <cov:IdVrsFlx>124</cov:IdVrsFlx>
    <cov:IdTrtEmt>Rentes Vie</cov:IdTrtEmt>
    <cov:IdUtl>TTBATCH</cov:IdUtl>
    <cov:VrsCbl></cov:VrsCbl>
    <cov:ChpLbr>1V400220191231VIGERIRVESMMA1377900</cov:ChpLbr>
  </cov:DonEneTch>
  <tns:DonMet>
    <tns:DonEneFnc>
      <fs2c:CodSocJur>1V4002</fs2c:CodSocJur>
      <fs2c:DatArr>20191231</fs2c:DatArr>
      <fs2c:TypFiColl>VIGERIRVES</fs2c:TypFiColl>
      <fs2c:TimStmCreFic>2020-01-13 10.02.13.000000</fs2c:TimStmCreFic>
      <fs2c:CodEns>MMA</fs2c:CodEns>
    </tns:DonEneFnc>

    <tns:PolRntVie>
      <tns:NumEnr>20191290</tns:NumEnr>
      <tns:NumPol>050000111901</tns:NumPol>
      <tns:PMVie>997.75</tns:PMVie>
    </tns:PolRntVie>
    <tns:PolRntVie>
      <tns:NumEnr>20191291</tns:NumEnr>
      <tns:NumPol>050000112002</tns:NumPol>
      <tns:PMVie>4385.15</tns:PMVie>
    </tns:PolRntVie>

我想提取的是最后一部分信息“NumEnr”、“NumPol”、“PMVie”。

我嘗試如下調整一些代碼示例,但我對 xml 不夠熟悉以使其工作。

import pandas as pd
from xml.etree import ElementTree as et
import csv

libname = "C:/Users/a61787/Documents/"

tree = ET.parse(libname+'rente.xml')
with open(libname+'b.csv','w',newline='',encoding='utf8') as sitescope_data:
    csvwriter = csv.writer(sitescope_data)
    col_names = 'Numenr NumPol PMVie'.split()
    csvwriter.writerow(col_names)
    for event in tree.findall('tns:DonMet/tns:PolRntVie'):
        event_data = ['' if (e:=event.find(col)) is None else e.text for col in col_names]
        csvwriter.writerow(event_data)

dataframe = pd.read_csv('b.csv',encoding='utf8')
print(dataframe.shape)

這給了我一個只有 col 名稱的空 DF。

最后我想得到的是下表:

Numenr      NumPol          PMVie
20191290    050000111901    997.75
20191291    050000112002    4385.15

如果您有任何想法,我將不勝感激

您可以使用beautifulsoup執行此操作,如下所示:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        csv_output.writerow(tns.find(entry).text for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie'])

這會給你一個b.csv文件,其中包含:

Numenr,NumPol,PMVie
20191290,050000111901,997.75
20191291,050000112002,4385.15

如果缺少物品:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        row = []
        
        for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie']:
            try:
                row.append(tns.find(entry).text)
            except AttributeError:
                row.append('')
                
        csv_output.writerow(row)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM