简体   繁体   中英

Read .xls file with Python pandas read_excel not working, says it is a .xlsb file

I'm trying to read several.xls files, saved on a NAS folder, with Apache Airflow, using the read_excel python pandas function.

This is the code I'm using:

df = pd.read_excel('folder/sub_folder_1/sub_folder_2/file_name.xls', sheet_name=April, usecols=[0,1,2,3], dtype=str, engine='xlrd')

This worked for a time, but recently I have been getting this error for several of those files:

Excel 2007 xlsb file; not supported

[...]

xlrd.biffh.XLRDError: Excel 2007 xlsb file; not supported

These files are clearly.xls files, yet my code seems to detect them as.xlsb files, which are not supported. I would prefer a way to specify they are.xls file, or alternatively, a way to read xlsb files.

Not sure if this is relevant, but these files are updated by an external team, who may have modified some parameter of these files without me knowing so, but I think that if this was the case, I would be getting a different error.

Try:

import openpyxl

xls = pd.ExcelFile('data.xls', engine='openpyxl')
df = pd.read_excel(xls)

XLRD has removed the ability to read in some excel datatypes recently like xlxs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM