使用熊猫从URL文件下载/加载xls

Question

I am trying to load the Excel file from the following URL into a dataframe using Python 3.5 and Pandas: 我正在尝试使用Python 3.5和Pandas将以下URL的Excel文件加载到数据框中：

link = "https://hub.coursera-notebooks.org/user/ejquqxfjajkufidbixxvkx/notebooks/Energy%20Indicators.xls"

First I tried to download the file manually using urllib.request in order to read it right after: 首先，我尝试使用urllib.request手动下载文件，以便在之后读取它：

import urllib.request
urllib.request.urlretrieve (link, "Energy Indicators.xls")

I got the file "Energy Indicators.xls", yes, but it is not a valid xls file. 我得到了文件“ Energy Indicators.xls”，是的，但是它不是有效的xls文件。 It seems more like a html file with the extension changed to xls. 似乎更像是一个html文件，其扩展名更改为xls。

Then I tried to load the file directly using read_csv: 然后，我尝试使用read_csv直接加载文件：

energy = pd.read_csv(link, skiprows = 16, header = 0, skipfooter = 38)

But I got a traceback error: "pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2". 但是我得到了一个回溯错误：“ pandas.io.common.CParserError：标记数据出错。C错误：第12行中应有1个字段，看到2个”。 If I tried to read it without the arguments skiprows, header, etc. I got another error: "ValueError: Expected 1 fields in line 41, saw 3". 如果我尝试不使用参数行距，标题等来读取它，则会收到另一个错误：“ ValueError：第41行中应有1个字段，看到3个”。

Any idea? 任何想法？ BTW, I am using Mac OS Sierra and PyCharm Community Edition 2016.3 顺便说一句，我正在使用Mac OS Sierra和PyCharm Community Edition 2016.3

Answer 1

对于此特定的Coursera练习（不是一般情况），您不能在read_excel函数中使用整个URL，而只能使用“ Energy Indicators.xls”

energy = pd.read_excel('Energy Indicators.xls',...)

使用熊猫从URL文件下载/加载xls

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-09-18 02:43:36

使用熊猫从URL文件下载/加载xls

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-09-18 02:43:36

解决方案1
2 已采纳 2017-09-18 02:43:36