简体   繁体   中英

Downloading a xml file using an API URL in python

I am trying to download data which is returned in an xml file from an api with the following url

URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'

When I use the url in my web browser the xml file downloaded looks like this

<?xml version="1.0" encoding="UTF-8"?>
<OASISReport xmlns="http://www.caiso.com/soa/OASISReport_v1.xsd">
<MessageHeader>
<TimeDate>2018-04-06T15:17:51-00:00</TimeDate>
<Source>OASIS</Source>
<Version>v20131201</Version>
</MessageHeader>
<MessagePayload>
<RTO>
<name>CAISO</name>
<REPORT_ITEM>
<REPORT_HEADER>
<SYSTEM>OASIS</SYSTEM>
<TZ>PPT</TZ>
<REPORT>PRC_FUEL</REPORT>
<UOM>US$</UOM>
<INTERVAL>ENDING</INTERVAL>
<SEC_PER_INTERVAL>3600</SEC_PER_INTERVAL>
</REPORT_HEADER>
<REPORT_DATA>
<DATA_ITEM>FUEL_PRC</DATA_ITEM>
<RESOURCE_NAME>CISO</RESOURCE_NAME>
<OPR_DATE>2013-09-19</OPR_DATE>
<INTERVAL_NUM>24</INTERVAL_NUM>

However, when I download using a python script it is something very different.

Python script:

r=requests.get(URL)
r.encoding="UTF-8"
with open ('data.xml','wb') as file:
    file.write(r.content)

Downloaded file:

PKEL520130919_20130928_PRC_FUEL_N_20180406_08_44_40_v1.xmlíÝïOǹàïþ+Pt¤ó)ÝåQ*SdJ§_,âlMÿþÇK{Í£yV{í;ï¶vRâ¹XÞÌ<÷=»ûþ×?lü<sy}õ§Ï&O·>Û_½»þöòê»?}öõ¿|þì³?<Ù?}q~rþzþÓõûÛ»ÿÇÕÍ>ûþöö§/67ùå§ï..o®¾»þqóæúbsáï}ûóäé¿n¾ýìî+¼ßÜ\|7ÿëüâÛùû»_¿¹üqþòâv~0Ý<û|kûó­Ý7/¶·¿ØÞú|kë­­ýÍ?þ'ûç×ÿ|ÿn~ðákïoþþ«'ûûíO~ðóÝWMîþcóão=Ùßü÷æï¿>»øõëoï~ãõÓ»ÿ¼ºøq~pøâäütóÃÿ¾ûGg§¯ß¼=ysôêÓ¯þzôâåÑëû?Ìÿßÿß~u·¤¿½¹ûcÿýÿÏÁÙë÷ùúèËýÍßãÉþק¯¾>ÿ¯ýÍûÿñdÿä«7G¯ÿöâË£¯^|u¼¿ùÇoÜýß½~ûÇoÍvï]þã·|üòþ¿ÿúå7/î~uÿ_¿­æþóöîOµ¿ùé÷îÿîóÓ¯_½ýêÅ«£Ãïî5pöúþËÝÇfo=ÿ|ò|óßü´·_}ýê`ºýi!~c᯿yq÷';~õæ¯4Ýz³µûÅïúÇïýÿów/|;«ÿø{|ïÝ«åÅ_l?ݹûÃýö¿?Áýµj¶Ym§í1÷ubDÔ&Ï­T©ÝgPFÕ[tµÚó¨~BK®Ul
@-ô¯Ò¢«Õ&PÛª=¶èjµéÔv£j-ºZm6µ½¨Úc®VÛÚ³¨Úc®V{l÷NjÏ£jM»Üû/0]îV­éLuÿp¦DO®ºm§Iôxð誫Ùp<DÏ®ºm:óÁ$z@xtÕÕlC8 L¢'GW]Í6Â$zDxtÕÕlC8"L¢gGW]=ij-tH(­ºm϶Ð)¡´êj¶!<Û¦¡SBiÕÕlCx¶MC§Òª«Ù0ÿN   ¥UW³¥@>þòaÂÛiÞ{v|4óÞU°u\&¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁeì×:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2áFplFplkÁ­­Ã«vlÑìŸ46æ½ßòóãɮ¥üð$¨Ñ~VX5:¸dÕèU£¬½bÕèÊ«f÷ó\6úè²Ñ#    ¹lô¸e³û.%¹ltpé²Ñ1¹ËF2\6ºä²Ñ3®7ºltÖe£«Û,}Oàqµ1ï}ø2t]sº¬A·5]5yêªÉPWMÞºjòÔUw ¬¼eÑäz&·óX4¹Ç¢ÉÝ<M®æ±hr3Ey,ÜËcÑèZ«&·ò\5¹çªÉ<WM®ä±jt#ÏUy®Üz\mÌu_/}/dI?=    lòÔeÏ®<pÕäÉ«&Y]5yïªÉÑ«&§®»jr÷ÂU£{>0\*Ùä#Ì&×ea6¹
³ÉÓVM¾u³É×N`69HÙäÔÒe£#rMîcÀlrù§À6æ½_cápjrnÉ¢É&ço,¿±hrúÆ¢Éá&go,½±htòæªÉÁ«&çn®»¹jrêæªÉ¡«&gn®¹¹jrâæªÉ«Fçm®·¹jrÚæªÉê\µhÌB¼YÍë.|ÇOÎO+^÷ÿK±Þ½òÛf¬÷_`å Ú´ß/

I am assuming it's an encoding issue, but I am struggling with the solution.

Thanks in advance for your help!

This should help. The url you mentioned gives a zip. You can download that and extract it to get your XML.

EX:

import requests 
import zipfile 
import StringIO

URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
r = requests.get(URL, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

You could also use urllib

import  urllib

urllib.urlretrieve(
"http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1", 
"oasis.zip")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM