简体   繁体   English

如何从 Python 中的 XML 响应编写 CSV?

[英]How to write CSV from XML response in Python?

by the following HTTP request通过以下 HTTP 请求


import requests
import csv

url = 'http://www.culturaitalia.it/oaiProviderCI/OAIHandler?verb=ListRecords&metadataPrefix=pico&set=collezione_pansa_villa_frigerj'

e = requests.get(url)

data = e.text

print(data)

I'm having as output this XML file我将这个 XML 文件作为输出

<record><header><identifier>oai:culturaitalia.it:oai:culturaitalia.it:museiditalia-work_46880</identifier><datestamp>2018-08-29T17:56:41Z</datestamp><setSpec>museiditalia_opere</setSpec><setSpec>opere_museid</setSpec><setSpec>Beni_culturali</setSpec><setSpec>collezione_pansa_villa_frigerj</setSpec></header><metadata>
<pico:record xmlns:pico="http://purl.org/pico/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:iccd="http://purl.org/pico/iccd/2.00/" xmlns:oad="http://purl.org/pico/iccd/2.00/oa-d-n/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:smi="http://purl.org/pico/iccd/2.00/s-mi/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bdm="http://purl.org/pico/iccd/2.00/bdm/" xmlns:mets="http://www.loc.gov/METS/" xmlns:f="http://purl.org/pico/iccd/2.00/f/" xmlns:vra="http://www.vraweb.org/vracore4.htm" xmlns:iccd3="http://purl.org/pico/iccd/3.00/" xmlns:mix="http://www.loc.gov/mix/v20" xmlns:nu="http://purl.org/pico/iccd/3.00/nu/" xmlns:premis="info:lc/xmlns/premis-v2" xsi:schemaLocation="http://purl.org/pico/1.0/               http://www.culturaitalia.it/pico/schemas/1.0/pico.xsd                     http://purl.org/pico/iccd/2.00/         http://www.culturaitalia.it/pico/schemas/iccd/2.00/iccd.xsd                     http://purl.org/pico/iccd/2.00/oa-d-n/  http://www.culturaitalia.it/pico/schemas/iccd/2.00/oa-d-n.xsd                     http://purl.org/pico/iccd/2.00/s-mi/    http://www.culturaitalia.it/pico/schemas/iccd/2.00/s-mi.xsd                     http://purl.org/pico/iccd/2.00/bdm/     http://www.culturaitalia.it/pico/schemas/iccd/2.00/bdm.xsd                     http://purl.org/pico/iccd/2.00/f/       http://www.culturaitalia.it/pico/schemas/iccd/2.00/f.xsd                     http://purl.org/pico/iccd/3.00/         http://www.culturaitalia.it/pico/schemas/iccd/3.00/iccd.xsd                     http://purl.org/pico/iccd/3.00/nu/      http://www.culturaitalia.it/pico/schemas/iccd/3.00/nu.xsd">
  <dc:identifier>work_46880</dc:identifier>
  <dc:title>BROCCHETTA MINIATURISTICA</dc:title>
  <dc:subject xsi:type="pico:Thesaurus">http://culturaitalia.it/pico/thesaurus/4.1#reperti_archeologici</dc:subject>
  <dc:description xml:lang="it">BROCCHETTA MONOANSATA. ANSA A DOPPIO BASTONCELLO ARCUATO CHE SI SALDA SULCOLLO AL DI SOTTO DEL LABBRO ESPANSO. CORPO BACCELLATO CON INCISIONE AD XSOTTO L'ANSA, BASSO PIEDE TRONCOCONICO. VERNICE MALCOTTA CON AVVAMPATURESUL PIEDE.</dc:description>
  <dcterms:spatial>Museo Archeologico Nazionale d'Abruzzo, Villa Frigerj, CHIETI (CH) - ITALIA - sala collezione Pansa - vetrina 1, inv. 3130</dcterms:spatial>
  <dcterms:spatial xsi:type="pico:ISTAT">name=CHIETI; year=2001; code=069022</dcterms:spatial>
  <dcterms:created>SEC. III A.C.</dcterms:created>
  <dcterms:created xsi:type="dcterms:Period">start=299; end=250</dcterms:created>
  <dc:type xsi:type="mdi:Type">Opere</dc:type>
  <dc:type xml:lang="it">BROCCHETTA MINIATURISTICA</dc:type>
  <dc:type xsi:type="dcterms:DCMIType">PhysicalObject</dc:type>
  <dcterms:isPartOf xsi:type="dcterms:URI">oai:culturaitalia.it:museiditalia-coll_445</dcterms:isPartOf>
  <dc:rights xml:lang="it"/>
  <dcterms:rightsHolder xml:lang="it">PROPRIETA' STATO, Ministero per i Beni e le Attività Culturali</dcterms:rightsHolder>
  <dcterms:isReferencedBy xml:lang="it">Scheda ICCD RA: 13-00008576</dcterms:isReferencedBy>
  <pico:materialAndTechnique xml:lang="it">ARGILLA</pico:materialAndTechnique>
  <dcterms:extent>altezza: cm 9.4</dcterms:extent>
  <dcterms:extent>diametro: cm 6.9</dcterms:extent>
  <pico:preview xsi:type="dcterms:URI">http://194.242.241.163/fedora/objects/work:46880/datastreams/MM135934/content</pico:preview>
  <dcterms:isReferencedBy xsi:type="pico:Anchor">title=visualizza il file Mets; URL=fedora/objects/work:46880/datastreams/export/content</dcterms:isReferencedBy>
</pico:record>
</metadata></record>

How can I write to a CSV file the output from my HTTP request?如何将 HTTP 请求的输出写入 CSV 文件? Maybe using Pandas?也许使用熊猫?

Regards问候

I advise you to use json format which is easier to deal in python you can play with it as you want.我建议您使用 json 格式,它在 python 中更容易处理,您可以根据需要使用它。 But look at this post it may be helpful for you.但是看看这篇文章,它可能对你有帮助。

You can parse some of the data with regular expressions.您可以使用正则表达式解析一些数据。

import re
import pandas as pd

# I like to "tokenize" text, if possible.
tokens = [i.strip() for i in sample.split('\n') if len(i) > 0]

# Create a regular expression pattern for  tag values and text values
# Note: the ?P<> part is how we can identify each matching section.
full_pat = r"<(?P<tag>[a-z0-9:\"\.:\= ]+)>(?P<text>[\w\d ]+)<?/?"

# Compile it (for speed, I think)
# The re.I flag means to ignore whether the letter is uppercase or lowercase
p = re.compile(full_pat, flags=re.I)

results_dict = dict()
for i, v in enumerate(tokens):
    res = p.search(v)
    try:
        # Append a dictionary with our tag and text values to our results dictionary.
        results_dict[i] = dict(tag=res.group('tag'), text=res.group('text'))
    except AttributeError:
        pass

Output of results_dict: results_dict 的输出:

{0: {'tag': 'identifier', 'text': 'oai'},
 2: {'tag': 'dc:identifier', 'text': 'work_46880'},
 3: {'tag': 'dc:title', 'text': 'BROCCHETTA MINIATURISTICA'},
 4: {'tag': 'dc:subject xsi:type="pico:Thesaurus"', 'text': 'http'},
 5: {'tag': 'dc:description xml:lang="it"', 'text': 'BROCCHETTA MONOANSATA'},
 6: {'tag': 'dcterms:spatial', 'text': 'Museo Archeologico Nazionale d'},
 7: {'tag': 'dcterms:spatial xsi:type="pico:ISTAT"', 'text': 'name'},
 8: {'tag': 'dcterms:created', 'text': 'SEC'},
 9: {'tag': 'dcterms:created xsi:type="dcterms:Period"', 'text': 'start'},
 10: {'tag': 'dc:type xsi:type="mdi:Type"', 'text': 'Opere'},
 11: {'tag': 'dc:type xml:lang="it"', 'text': 'BROCCHETTA MINIATURISTICA'},
 12: {'tag': 'dc:type xsi:type="dcterms:DCMIType"', 'text': 'PhysicalObject'},
 13: {'tag': 'dcterms:isPartOf xsi:type="dcterms:URI"', 'text': 'oai'},
 15: {'tag': 'dcterms:rightsHolder xml:lang="it"', 'text': 'PROPRIETA'},
 16: {'tag': 'dcterms:isReferencedBy xml:lang="it"', 'text': 'Scheda ICCD RA'},
 17: {'tag': 'pico:materialAndTechnique xml:lang="it"', 'text': 'ARGILLA'},
 18: {'tag': 'dcterms:extent', 'text': 'altezza'},
 19: {'tag': 'dcterms:extent', 'text': 'diametro'},
 20: {'tag': 'pico:preview xsi:type="dcterms:URI"', 'text': 'http'},
 21: {'tag': 'dcterms:isReferencedBy xsi:type="pico:Anchor"', 'text': 'title'}}

Convert to a Pandas DataFrame and use the .to_csv() function to write a csv file (I'll let you figure that part out).转换为 Pandas DataFrame 并使用 .to_csv() 函数编写一个 csv 文件(我会让你弄清楚那部分)。 Note: We have to make sure our dictionary is parsed correctly, so we have the orientation as 'index,' versus the default value of 'columns'.注意:我们必须确保我们的字典被正确解析,所以我们的方向是“索引”,而不是“列”的默认值。

df = pd.DataFrame().from_dict(results_dict, orient='index')
print(df)

Output:输出:

                                              tag                            text
0                                      identifier                             oai
2                                   dc:identifier                      work_46880
3                                        dc:title       BROCCHETTA MINIATURISTICA
4            dc:subject xsi:type="pico:Thesaurus"                            http
5                    dc:description xml:lang="it"           BROCCHETTA MONOANSATA
6                                 dcterms:spatial  Museo Archeologico Nazionale d
7           dcterms:spatial xsi:type="pico:ISTAT"                            name
8                                 dcterms:created                             SEC
9       dcterms:created xsi:type="dcterms:Period"                           start
10                    dc:type xsi:type="mdi:Type"                           Opere
11                          dc:type xml:lang="it"       BROCCHETTA MINIATURISTICA
12            dc:type xsi:type="dcterms:DCMIType"                  PhysicalObject
13        dcterms:isPartOf xsi:type="dcterms:URI"                             oai
15             dcterms:rightsHolder xml:lang="it"                       PROPRIETA
16           dcterms:isReferencedBy xml:lang="it"                  Scheda ICCD RA
17        pico:materialAndTechnique xml:lang="it"                         ARGILLA
18                                 dcterms:extent                         altezza
19                                 dcterms:extent                        diametro
20            pico:preview xsi:type="dcterms:URI"                            http
21  dcterms:isReferencedBy xsi:type="pico:Anchor"                           title

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM