简体   繁体   English

iShares SP 500 ETF 的 Excel 文件有什么问题?

[英]What's wrong with iShares SP 500 ETF's Excel file?

I am trying to parse iShares SP 500 ETF's Excel file , which looks like this:我正在尝试解析iShares SP 500 ETF 的 Excel 文件,如下所示:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="Default">
<ss:Alignment ss:Horizontal="Left"/>
</ss:Style>
...

It seems to be an old XLS Excel type file, but it is an XML file, yet xml.etree.ElementTree is complaining a lot.它似乎是一个旧的 XLS Excel 类型文件,但它是一个 XML 文件,但xml.etree.ElementTree抱怨很多。

I have tried:我努力了:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name)

and with encoding:并带有编码:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(recover=True))
import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(encoding='utf-8'))

error:错误:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1
rb = xlrd.open_workbook(file_name, encoding_override='utf-8')
print(rb)

error:错误:

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf\xef\xbb\xbf<?'

but none seems to work at all... could anyone guide me in the right direction?但似乎根本没有工作......有人能引导我朝着正确的方向前进吗?

Your "XML" file is not well-formed :您的“XML”文件格式不正确

  1. It has two BOM characters before the XML declaration.它在 XML 声明之前有两个 BOM 字符。

    To remove (at least) one, see Convert UTF-8 with BOM to UTF-8 with no BOM in Python要删除(至少)一个,请参阅Convert UTF-8 with BOM to UTF-8 with no BOM in Python

  2. Remove the & or change it to &amp;删除&或将其更改为&amp; on line 43.在第 43 行。

     <ss:Data ss:Type="String">iShares Core S&P 500 ETF</ss:Data> ^
  3. Ditto for the & on line 117,188.与第 117,188 行的&同上。

Without the above repairs, your file cannot be parsed as XML (because technically it isn't).如果没有上述修复,您的文件将无法解析为 XML(因为从技术上讲不是)。

After the above repairs, you'll be able to parse it as XML, and Excel will be able to open it:经过以上修复,就可以解析为XML,Excel就可以打开了:

在此处输入图像描述

See also也可以看看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM