[英]Error (little-endian) reading a XLS file with python
I download a XLS file from the web using selenium. 我使用硒从网上下载了XLS文件。
I tried many options I found in stack-overflow and other websites to read the XLS file : 我尝试了在堆栈溢出和其他网站中找到的许多选项来读取XLS文件:
import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'
And 和
df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'
And again 然后再次
from xlrd import open_workbook
book = open_workbook('test.xls')
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'
I have tried different encoding: utf-8, ANSII, utf_16_be, utf16 I have even tried to get the encoding of the file from notepad or other applications. 我尝试了不同的编码:utf-8,ANSII,utf_16_be,utf16我什至尝试从记事本或其他应用程序中获取文件的编码。
Type of file : Microsoft Excel 97-2003 Worksheet (.xls) I can open the file with Excel without any issue. 文件类型:Microsoft Excel 97-2003工作表(.xls)我可以使用Excel打开文件而没有任何问题。 What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.
令人沮丧的是,如果我使用excel打开文件并按保存,则可以使用上一个python命令读取文件。
I would be really grateful if someone could provide me other ideas I could try. 如果有人可以给我其他可以尝试的想法,我将非常感激。 I need to open this file with a python script only.
我只需要使用python脚本打开此文件。
Thanks, Max 谢谢,马克斯
Solution (Somewhat messy but simple) that could potentially work for any type of Excel file : 可能适用于任何类型的Excel文件的解决方案 (有些混乱,但很简单):
Called VBA from python to Open and save the file in Excel. 从python调用VBA以打开并在Excel中保存文件。 Excel "clean-up" the file and then Python is able to read it with any read Excel type function
Excel“清理”文件,然后Python可以使用任何读取的Excel类型函数读取文件
Solution inspired by @Serge Ballesta and @John Y comments. 受@Serge Ballesta和@John Y评论启发的解决方案。
## Open a file in Excel and save it to correct the encoding error
import win32com.client
import pandas
downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"
xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True # enables Excel pop up message for saving the file
df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file
Thank you all! 谢谢你们!
What does pd mean?? pd是什么意思? What
什么
pandas is made for data science. 熊猫是为数据科学而设计的。 In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).
在我看来,您必须使用openpyxl (仅读取和写入xlsx)或xlwt / xlrd (读取xls ...并仅写入xls)。
from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =....
It has several examples with this on Internet... 它在互联网上有几个例子...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.