简体   繁体   中英

Error (little-endian) reading a XLS file with python

I download a XLS file from the web using selenium.

I tried many options I found in stack-overflow and other websites to read the XLS file :

import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'

And

df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'

And again

from xlrd import open_workbook
book = open_workbook('test.xls') 
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'

I have tried different encoding: utf-8, ANSII, utf_16_be, utf16 I have even tried to get the encoding of the file from notepad or other applications.

Type of file : Microsoft Excel 97-2003 Worksheet (.xls) I can open the file with Excel without any issue. What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.

I would be really grateful if someone could provide me other ideas I could try. I need to open this file with a python script only.

Thanks, Max

Solution (Somewhat messy but simple) that could potentially work for any type of Excel file :

Called VBA from python to Open and save the file in Excel. Excel "clean-up" the file and then Python is able to read it with any read Excel type function

Solution inspired by @Serge Ballesta and @John Y comments.

## Open a file in Excel and save it to correct the encoding error 
import win32com.client
import pandas

downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"

xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True  # enables Excel pop up message for saving the file

df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file

Thank you all!

What does pd mean?? What

pandas is made for data science. In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).

from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =.... 

It has several examples with this on Internet...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM