简体   繁体   中英

Python: how import excel file from the web?

I need to import excel file from the link. I tried to do it with

filedlurl = 'https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
    
df = pd.read_excel(filedlurl, skiprows=2)

But the error was XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Exchange' XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Exchange'

then I found the following way from using Pandas to read in excel file from URL - XLRDError

df = pd.read_csv('https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls',
                 #sep='\t',
                 #parse_dates=[0],
                 names=['a','b','c','d','e','f'],
                skiprows=2)
df

Here is also a strange output. How can I get the following table as when manually download from the web?

Your file isn't a CSV or an Excel file. Actual contents are an HTML table (see as follows).

Exchange in {0}, Import(+)/Export(-)
<html>
    <body>
        <table>
            <thead>
                <tr>
                    <td colspan="5">Exchange EE connections in MWh, MW</td>
                </tr><tr>
                    <td colspan="5">Data was last updated 06-01-2021</td>
                </tr><tr>
                    <td></td><td style="text-align:center;">EE net exchange</td><td style="text-align:center;">EE - FI</td><td style="text-align:center;">EE - LV</td><td style="text-align:center;">EE - RU</td>
                </tr>
            </thead><tbody>
                <tr>
                    <td style="text-align:left;">01-01-2021</td><td style="text-align:right;">14575</td><td style="text-align:right;">20969,0</td><td style="text-align:right;">-4884,0</td><td style="text-align:right;">-1510,0</td>
                </tr><tr>
                    <td style="text-align:left;">02-01-2021</td><td style="text-align:right;">12073</td><td style="text-align:right;">22479,0</td><td style="text-align:right;">-8001,0</td><td style="text-align:right;">-2405,0</td>
                </tr><tr>
                    <td style="text-align:left;">03-01-2021</td><td style="text-align:right;">14321</td><td style="text-align:right;">22540,0</td><td style="text-align:right;">-8259,0</td><td style="text-align:right;">40,0</td>
                </tr><tr>
                    <td style="text-align:left;">04-01-2021</td><td style="text-align:right;">14662</td><td style="text-align:right;">17653,0</td><td style="text-align:right;">-5829,0</td><td style="text-align:right;">2838,0</td>
                </tr><tr>
                    <td style="text-align:left;">05-01-2021</td><td style="text-align:right;">13570</td><td style="text-align:right;">13779,0</td><td style="text-align:right;">-5314,0</td><td style="text-align:right;">5105,0</td>
                </tr><tr>
                    <td style="text-align:left;">06-01-2021</td><td style="text-align:right;">6243</td><td style="text-align:right;"></td><td style="text-align:right;"></td><td style="text-align:right;"></td>
                </tr>
            </tbody>
        </table>
    </body>
</html>

Use pd.read_html like so:

import pandas as pd

url = 'https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
dfs = pd.read_html(url)
df = dfs[0]

That you can open your file in Excel is because Excel iterates through possible formats until finding something that works. Eg you can make a tab separated values (which should have extension.tsv) file, append.xls and while it isn't an actual horrible spread sheet format (XLS), Excel will still open it normally. It also does this with HTML data.

First, can download the file using Python this way, using urllib.request :

import urllib.request
url='https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls'
filename='yourfile.csv'
urllib.request.urlretrieve (url,filename)

Then read it in using Pandas:

import pandas as pd
df = pd.read_excel(filename)

Which will give you:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Exchange'

After checking with a real Excel, I find:

在此处输入图像描述

So something is wrong with your file, I suspect. With a correct file, the above method should work.

Latest pandas (pandas==1.2.0) is able to read from url using pd.read_excel when requests is installed.

import pandas
url = "https://www.nordpoolgroup.com/48d3ac/globalassets/marketdata-excel-files/exchange-ee-connections_2021_daily.xls"
df = pandas.read_excel(url)

But your file seems corrupted, it triggers:

ValueError: File is not a recognized excel file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM