在Python中强制将xml文件保存为xls格式

Question

我在这里有此代码，该代码以Excel 2004 xml格式下载此基金数据：

import urllib2
url = 'https://www.ishares.com/us/258100/fund-download.dl'
s = urllib2.urlopen(url)
contents = s.read()
file = open("export.xml", 'w')
file.write(contents)
file.close()

我的目标是以编程方式将此文件转换为.xls，然后我可以通过该文件将其读取为pandas DataFrame。 我知道我可以使用python的xml库解析此文件，但是，我确实注意到，如果我打开xml文件并使用xls文件扩展名手动保存它，那么pandas可以读取它，并且得到了想要的结果。

我也尝试使用下面的代码重命名文件扩展名，但是该方法不会“强制”保存文件，它仍然是带有xls文件扩展名的基础xml文档。

import os
import sys
folder = '~/models'
for filename in os.listdir(folder):
    if filename.startswith('export'):
        infilename = filename
        newname = infilename.replace('newfile.xls', 'f.xls')
        output = os.rename(infilename, newname)

https://www.ishares.com/us/258100/fund-download.dl

Answer 1

对于Windows版Excel，请考虑使用win32com模块使用Python来COM连接到Excel对象库。 具体来说，使用Excel的Workbooks.OpenXML和SaveAs方法将下载的xml保存为csv：

import os
import win32com.client as win32    
import requests as r
import pandas as pd

cd = os.path.dirname(os.path.abspath(__file__))

url = "http://www.ishares.com/us/258100/fund-download.dl"
xmlfile = os.path.join(cd, 'iSharesDownload.xml')
csvfile = os.path.join(cd, 'iSharesDownload.csv')

# DOWNLOAD FILE
try:
    rqpage = r.get(url)
    with open(xmlfile, 'wb') as f:
        f.write(rqpage.content)    
except Exception as e:
    print(e)    
finally:
    rqpage = None

# EXCEL COM TO SAVE EXCEL XML AS CSV
if os.path.exists(csvfile):
    os.remove(csvfile)
try:
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    wb = excel.Workbooks.OpenXML(xmlfile)
    wb.SaveAs(csvfile, 6)
    wb.Close(True)    
except Exception as e:
    print(e)    
finally:
    # RELEASES RESOURCES
    wb = None
    excel = None

# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())

#        Weight (%)       Price  Coupon (%)     YTM (%)  Yield to Worst (%)    Duration
# count  625.000000  625.000000  625.000000  625.000000          625.000000  625.000000
# mean     0.159888  101.298768    6.500256    5.881168            5.313760    2.128688
# std      0.126833   10.469460    1.932744    4.059226            4.224268    1.283360
# min     -0.110000    0.000000    0.000000    0.000000           -8.030000    0.000000
# 25%      0.090000  100.380000    5.130000    3.430000            3.070000    0.970000
# 50%      0.130000  102.940000    6.380000    4.930000            3.910000    2.240000
# 75%      0.190000  105.000000    7.630000    6.820000            6.070000    3.260000
# max      1.750000  128.750000   12.500000   40.900000           40.900000    5.060000

Answer 2

使用Excel for MAC，请考虑使用VBA解决方案，因为VBA是连接到Excel对象库的最常用语言。 下面下载iShares xml，然后将其SaveAs为csv，以便使用OpenXML和SaveAs方法导入熊猫。

注意：这在Mac上未经测试，但希望Microsoft.XMLHTTP对象可用。

VBA （保存在启用宏的工作簿中）

Option Explicit

Sub DownloadXML()
On Error GoTo ErrHandle
    Dim wb As Workbook
    Dim xmlDoc As Object
    Dim xmlfile As String, csvfile As String

    xmlfile = ActiveWorkbook.Path & "\file.xml"
    csvfile = ActiveWorkbook.Path & "\file.csv"

    Call DownloadFile("https://www.ishares.com/us/258100/fund-download.dl", xmlfile)

    Set wb = Excel.Workbooks.OpenXML(xmlfile)

    wb.SaveAs csvfile, 6
    wb.Close True

ExitHandle:
    Set wb = Nothing
    Set xmlDoc = Nothing
    Exit Sub

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Resume ExitHandle
End Sub

Function DownloadFile(url As String, filePath As String)
    Dim WinHttpReq As Object, oStream As Object

    Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
    WinHttpReq.Open "GET", url, False
    WinHttpReq.send

    If WinHttpReq.Status = 200 Then
        Set oStream = CreateObject("ADODB.Stream")
        oStream.Open
        oStream.Type = 1
        oStream.Write WinHttpReq.responseBody
        oStream.SaveToFile filePath, 2 ' 1 = no overwrite, 2 = overwrite
        oStream.Close
    End If

    Set WinHttpReq = Nothing
    Set oStream = Nothing
End Function

蟒蛇

import pandas as pd

csvfile = "/path/to/file.csv"

# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())

#        Weight (%)       Price  Coupon (%)     YTM (%)  Yield to Worst (%)    Duration
# count  625.000000  625.000000  625.000000  625.000000          625.000000  625.000000
# mean     0.159888  101.298768    6.500256    5.881168            5.313760    2.128688
# std      0.126833   10.469460    1.932744    4.059226            4.224268    1.283360
# min     -0.110000    0.000000    0.000000    0.000000           -8.030000    0.000000
# 25%      0.090000  100.380000    5.130000    3.430000            3.070000    0.970000
# 50%      0.130000  102.940000    6.380000    4.930000            3.910000    2.240000
# 75%      0.190000  105.000000    7.630000    6.820000            6.070000    3.260000
# max      1.750000  128.750000   12.500000   40.900000           40.900000    5.060000

Answer 3

我发现与我合作的网站开发了一个api，从而可以规避网络抓取。 然后使用python的requests模块。

url = "https://www.blackrock.com/tools/hackathon/performance
for ticker in tickers:
    params = {'identifiers': ticker ,
              'returnsType':'MONTHLY'}
    request = requests.get(url, params=params)
    json = request.json()

在Python中强制将xml文件保存为xls格式

问题描述

3 个解决方案

解决方案1
0 2017-07-21 21:05:23

解决方案2
0 2017-07-24 20:08:39

解决方案3
0 2017-08-25 16:26:43

在Python中强制将xml文件保存为xls格式

问题描述

3 个解决方案

解决方案1 0 2017-07-21 21:05:23

解决方案2 0 2017-07-24 20:08:39

解决方案3 0 2017-08-25 16:26:43

解决方案1
0 2017-07-21 21:05:23

解决方案2
0 2017-07-24 20:08:39

解决方案3
0 2017-08-25 16:26:43