[英]How to write CSV from XML response in Python?
by the following HTTP request通过以下 HTTP 请求
import requests
import csv
url = 'http://www.culturaitalia.it/oaiProviderCI/OAIHandler?verb=ListRecords&metadataPrefix=pico&set=collezione_pansa_villa_frigerj'
e = requests.get(url)
data = e.text
print(data)
I'm having as output this XML file我将这个 XML 文件作为输出
<record><header><identifier>oai:culturaitalia.it:oai:culturaitalia.it:museiditalia-work_46880</identifier><datestamp>2018-08-29T17:56:41Z</datestamp><setSpec>museiditalia_opere</setSpec><setSpec>opere_museid</setSpec><setSpec>Beni_culturali</setSpec><setSpec>collezione_pansa_villa_frigerj</setSpec></header><metadata>
<pico:record xmlns:pico="http://purl.org/pico/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:iccd="http://purl.org/pico/iccd/2.00/" xmlns:oad="http://purl.org/pico/iccd/2.00/oa-d-n/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:smi="http://purl.org/pico/iccd/2.00/s-mi/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bdm="http://purl.org/pico/iccd/2.00/bdm/" xmlns:mets="http://www.loc.gov/METS/" xmlns:f="http://purl.org/pico/iccd/2.00/f/" xmlns:vra="http://www.vraweb.org/vracore4.htm" xmlns:iccd3="http://purl.org/pico/iccd/3.00/" xmlns:mix="http://www.loc.gov/mix/v20" xmlns:nu="http://purl.org/pico/iccd/3.00/nu/" xmlns:premis="info:lc/xmlns/premis-v2" xsi:schemaLocation="http://purl.org/pico/1.0/ http://www.culturaitalia.it/pico/schemas/1.0/pico.xsd http://purl.org/pico/iccd/2.00/ http://www.culturaitalia.it/pico/schemas/iccd/2.00/iccd.xsd http://purl.org/pico/iccd/2.00/oa-d-n/ http://www.culturaitalia.it/pico/schemas/iccd/2.00/oa-d-n.xsd http://purl.org/pico/iccd/2.00/s-mi/ http://www.culturaitalia.it/pico/schemas/iccd/2.00/s-mi.xsd http://purl.org/pico/iccd/2.00/bdm/ http://www.culturaitalia.it/pico/schemas/iccd/2.00/bdm.xsd http://purl.org/pico/iccd/2.00/f/ http://www.culturaitalia.it/pico/schemas/iccd/2.00/f.xsd http://purl.org/pico/iccd/3.00/ http://www.culturaitalia.it/pico/schemas/iccd/3.00/iccd.xsd http://purl.org/pico/iccd/3.00/nu/ http://www.culturaitalia.it/pico/schemas/iccd/3.00/nu.xsd">
<dc:identifier>work_46880</dc:identifier>
<dc:title>BROCCHETTA MINIATURISTICA</dc:title>
<dc:subject xsi:type="pico:Thesaurus">http://culturaitalia.it/pico/thesaurus/4.1#reperti_archeologici</dc:subject>
<dc:description xml:lang="it">BROCCHETTA MONOANSATA. ANSA A DOPPIO BASTONCELLO ARCUATO CHE SI SALDA SULCOLLO AL DI SOTTO DEL LABBRO ESPANSO. CORPO BACCELLATO CON INCISIONE AD XSOTTO L'ANSA, BASSO PIEDE TRONCOCONICO. VERNICE MALCOTTA CON AVVAMPATURESUL PIEDE.</dc:description>
<dcterms:spatial>Museo Archeologico Nazionale d'Abruzzo, Villa Frigerj, CHIETI (CH) - ITALIA - sala collezione Pansa - vetrina 1, inv. 3130</dcterms:spatial>
<dcterms:spatial xsi:type="pico:ISTAT">name=CHIETI; year=2001; code=069022</dcterms:spatial>
<dcterms:created>SEC. III A.C.</dcterms:created>
<dcterms:created xsi:type="dcterms:Period">start=299; end=250</dcterms:created>
<dc:type xsi:type="mdi:Type">Opere</dc:type>
<dc:type xml:lang="it">BROCCHETTA MINIATURISTICA</dc:type>
<dc:type xsi:type="dcterms:DCMIType">PhysicalObject</dc:type>
<dcterms:isPartOf xsi:type="dcterms:URI">oai:culturaitalia.it:museiditalia-coll_445</dcterms:isPartOf>
<dc:rights xml:lang="it"/>
<dcterms:rightsHolder xml:lang="it">PROPRIETA' STATO, Ministero per i Beni e le Attività Culturali</dcterms:rightsHolder>
<dcterms:isReferencedBy xml:lang="it">Scheda ICCD RA: 13-00008576</dcterms:isReferencedBy>
<pico:materialAndTechnique xml:lang="it">ARGILLA</pico:materialAndTechnique>
<dcterms:extent>altezza: cm 9.4</dcterms:extent>
<dcterms:extent>diametro: cm 6.9</dcterms:extent>
<pico:preview xsi:type="dcterms:URI">http://194.242.241.163/fedora/objects/work:46880/datastreams/MM135934/content</pico:preview>
<dcterms:isReferencedBy xsi:type="pico:Anchor">title=visualizza il file Mets; URL=fedora/objects/work:46880/datastreams/export/content</dcterms:isReferencedBy>
</pico:record>
</metadata></record>
How can I write to a CSV file the output from my HTTP request?如何将 HTTP 请求的输出写入 CSV 文件? Maybe using Pandas?
也许使用熊猫?
Regards问候
You can parse some of the data with regular expressions.您可以使用正则表达式解析一些数据。
import re
import pandas as pd
# I like to "tokenize" text, if possible.
tokens = [i.strip() for i in sample.split('\n') if len(i) > 0]
# Create a regular expression pattern for tag values and text values
# Note: the ?P<> part is how we can identify each matching section.
full_pat = r"<(?P<tag>[a-z0-9:\"\.:\= ]+)>(?P<text>[\w\d ]+)<?/?"
# Compile it (for speed, I think)
# The re.I flag means to ignore whether the letter is uppercase or lowercase
p = re.compile(full_pat, flags=re.I)
results_dict = dict()
for i, v in enumerate(tokens):
res = p.search(v)
try:
# Append a dictionary with our tag and text values to our results dictionary.
results_dict[i] = dict(tag=res.group('tag'), text=res.group('text'))
except AttributeError:
pass
Output of results_dict: results_dict 的输出:
{0: {'tag': 'identifier', 'text': 'oai'},
2: {'tag': 'dc:identifier', 'text': 'work_46880'},
3: {'tag': 'dc:title', 'text': 'BROCCHETTA MINIATURISTICA'},
4: {'tag': 'dc:subject xsi:type="pico:Thesaurus"', 'text': 'http'},
5: {'tag': 'dc:description xml:lang="it"', 'text': 'BROCCHETTA MONOANSATA'},
6: {'tag': 'dcterms:spatial', 'text': 'Museo Archeologico Nazionale d'},
7: {'tag': 'dcterms:spatial xsi:type="pico:ISTAT"', 'text': 'name'},
8: {'tag': 'dcterms:created', 'text': 'SEC'},
9: {'tag': 'dcterms:created xsi:type="dcterms:Period"', 'text': 'start'},
10: {'tag': 'dc:type xsi:type="mdi:Type"', 'text': 'Opere'},
11: {'tag': 'dc:type xml:lang="it"', 'text': 'BROCCHETTA MINIATURISTICA'},
12: {'tag': 'dc:type xsi:type="dcterms:DCMIType"', 'text': 'PhysicalObject'},
13: {'tag': 'dcterms:isPartOf xsi:type="dcterms:URI"', 'text': 'oai'},
15: {'tag': 'dcterms:rightsHolder xml:lang="it"', 'text': 'PROPRIETA'},
16: {'tag': 'dcterms:isReferencedBy xml:lang="it"', 'text': 'Scheda ICCD RA'},
17: {'tag': 'pico:materialAndTechnique xml:lang="it"', 'text': 'ARGILLA'},
18: {'tag': 'dcterms:extent', 'text': 'altezza'},
19: {'tag': 'dcterms:extent', 'text': 'diametro'},
20: {'tag': 'pico:preview xsi:type="dcterms:URI"', 'text': 'http'},
21: {'tag': 'dcterms:isReferencedBy xsi:type="pico:Anchor"', 'text': 'title'}}
Convert to a Pandas DataFrame and use the .to_csv() function to write a csv file (I'll let you figure that part out).转换为 Pandas DataFrame 并使用 .to_csv() 函数编写一个 csv 文件(我会让你弄清楚那部分)。 Note: We have to make sure our dictionary is parsed correctly, so we have the orientation as 'index,' versus the default value of 'columns'.
注意:我们必须确保我们的字典被正确解析,所以我们的方向是“索引”,而不是“列”的默认值。
df = pd.DataFrame().from_dict(results_dict, orient='index')
print(df)
Output:输出:
tag text
0 identifier oai
2 dc:identifier work_46880
3 dc:title BROCCHETTA MINIATURISTICA
4 dc:subject xsi:type="pico:Thesaurus" http
5 dc:description xml:lang="it" BROCCHETTA MONOANSATA
6 dcterms:spatial Museo Archeologico Nazionale d
7 dcterms:spatial xsi:type="pico:ISTAT" name
8 dcterms:created SEC
9 dcterms:created xsi:type="dcterms:Period" start
10 dc:type xsi:type="mdi:Type" Opere
11 dc:type xml:lang="it" BROCCHETTA MINIATURISTICA
12 dc:type xsi:type="dcterms:DCMIType" PhysicalObject
13 dcterms:isPartOf xsi:type="dcterms:URI" oai
15 dcterms:rightsHolder xml:lang="it" PROPRIETA
16 dcterms:isReferencedBy xml:lang="it" Scheda ICCD RA
17 pico:materialAndTechnique xml:lang="it" ARGILLA
18 dcterms:extent altezza
19 dcterms:extent diametro
20 pico:preview xsi:type="dcterms:URI" http
21 dcterms:isReferencedBy xsi:type="pico:Anchor" title
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.