简体   繁体   English

Python 读取 Excel 文件时出现错误 - “我们发现某些内容有问题……”

[英]Python Read Excel file with Error - “We Found a problem with some content…”

Here is my problem.这是我的问题。 We have an Excel based report that business users enter comments into two separate fields, as well as selecting a code form a drop down.我们有一个基于 Excel 的报告,业务用户在两个单独的字段中输入评论,以及从下拉列表中选择代码。 We then have a manual process that collects those files and pushes the comments and codes to a Snowflake table to be able to use in various reports.然后我们有一个手动过程来收集这些文件并将评论和代码推送到雪花表中,以便能够在各种报告中使用。

I am trying to improve the process with a Python script that will collect the files, copy them to a staging_folder location, then read in the data from the sheet, append it all together, do some cleanup and push to Snowflake.我正在尝试使用 Python 脚本来改进该过程,该脚本将收集文件,将它们复制到 staging_folder 位置,然后从工作表中读取数据 append 一起,进行一些清理并推送到 Snowflake。 The plan is that this would be completely automated - but this is where we run into issues.计划是这将完全自动化 - 但这是我们遇到问题的地方。

Initial step works perfectly.初始步骤完美运行。 I have a loop that grabs the files based on the previous business day date, copies them to a staging folder.我有一个循环,它根据前一个工作日日期抓取文件,将它们复制到临时文件夹。 There are typically 32 files each day.通常每天有 32 个文件。

Next step reads those files to append to a dataframe.下一步将这些文件读取到 append 到 dataframe。 Here is the function that is loading the Excel files in my Python script.这是在我的 Python 脚本中加载 Excel 文件的 function。

def load_files():
file_list = glob.glob(file_path + r'\*')
df = pd.DataFrame()
print("Importing data to Pandas DF...")
for file in file_list:
    try:
        wb = load_workbook(file)
        ws = wb["Daily Outs"]
        data = ws.values
        cols = next(data)[1:]
        data = list(data)
        idx = [r[0] for r in data]
        data = (islice(r, 1, None) for r in data)
        data_1 = pd.DataFrame(data, index=idx, columns=cols)
        df = df.append(data_1, sort=False)

        print(file + " Imported to Df...")
    except Exception as e:
        print("Error: " + e + " When attempting to open file: " + file)
        # error_notify(e)
print(df.head(10))
return df

The problem is when we have files that have some sort of corruption.问题是当我们的文件有某种损坏时。 The files when opened manually will show an error like the one below.手动打开文件时会显示如下错误。

手动打开 XLSX 损坏的文件时出错

I thought with my try, except code above this would catch an error like this and alert me with the error_notify(e) function.我想我的尝试,除了上面的代码会捕获这样的错误,并用 error_notify(e) function 提醒我。 However, we get a result where the Python script crashes with an error like this: zipfile.BadZipFile: File is not a zip file During handling of the above exception, another exception occurred.但是,我们得到的结果是 Python 脚本崩溃并出现如下错误: zipfile.BadZipFile: File is not a zip file 在处理上述异常期间,发生了另一个异常。

There is more to the error, but I only copied & pasted this part in some communication with some folks int he office.错误还有更多,但我只是在与办公室中的一些人进行一些交流时复制并粘贴了这一部分。 Impossible to replicate the error on our own - I have no idea how the files get corrupted in this way - except that there are multiple people accessing the files throughout the day.我们自己不可能复制错误 - 我不知道文件是如何以这种方式损坏的 - 除了全天有多人访问这些文件。

The way to make the file readable is completely manual - we must open the file, get that error, hit yes, and save the file over the existing one.使文件可读的方法是完全手动的——我们必须打开文件,得到那个错误,点击是,然后将文件保存在现有文件之上。 Then re-launch the script.然后重新启动脚本。 But since the try, except isn't catching it and alerting us to the failure, we have to run the script manually to see if it works or not.但是由于尝试,除了没有捕捉到它并提醒我们失败,我们必须手动运行脚本以查看它是否有效。

Two questions - am I doing something incorrect in my try, except command?两个问题——我在尝试中做错了什么,除了命令? I am admittedly weak in error catching so my first thought is there is more I can do there to make that work.诚然,我在错误捕捉方面很弱,所以我的第一个想法是我可以做更多的事情来完成这项工作。 Secondly, is there a Python way to get past that error in the Excel workbook files?其次,是否有 Python 方法可以克服 Excel 工作簿文件中的错误?

Here is the error text: Traceback (most recent call last): File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 48, in load_files wb = load_workbook(file) File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 314, in load_workbook data_only, keep_links) File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 124, in init self.archive = _validate_archive(fn) File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive archive = ZipFile(filename, 'r') File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1222, in init self._RealGetContents() File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1289, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file以下是错误文本: Traceback(最近一次调用最后一次):文件“G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py”,第 48 行,在 load_files wb = load_workbook(file) 文件“C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py”,第 314 行,在 load_workbook data_only,keep_links) 文件“C:\ProgramData\Anaconda3\lib\ site-packages\openpyxl\reader\excel.py”,第 124 行,在init self.archive = _validate_archive(fn) 文件“C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py”中,第 96 行,在 _validate_archive 存档 = ZipFile(filename, 'r') File "C:\ProgramData\Anaconda3\lib\zipfile.py",第 1222 行,在init self._RealGetContents() File "C:\ProgramData\Anaconda3\ lib\zipfile.py",第 1289 行,在 _RealGetContents 中引发 BadZipFile("文件不是 zip 文件") zipfile.BadZipFile:文件不是 zip 文件

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 123, in <module>
main()
  File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 86, in main
df_output = df_clean()
  File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 68, in df_clean
df = load_files()
  File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 61, in load_files
    print("Error: " + e + " When attempting to open file: " + file)
TypeError: can only concatenate str (not "BadZipFile") to str

Your try/except code looks correct.您的 try/except 代码看起来是正确的。 All user defined exceptions in python should be classes based on Exception. python 中所有用户定义的异常都应该是基于 Exception 的类。 SeeBaseException and andException in python documentation: "Exception (..) All user-defined exceptions should also be derived from this class" see also the exception class hierarchy tree at the end of the python doc sesction.请参阅 python 文档中的BaseException异常:“异常 (..) 所有用户定义的异常也应从此类派生”另请参阅 python 末尾的异常 class 层次结构树

If your python script "crashes" it means one of the library procedures throws an exception which is not based on the Exception class, something that "should not" be.如果您的 python 脚本“崩溃”,则意味着其中一个库程序抛出了一个不基于异常 class 的异常,这是“不应该”的事情。 You could look at the Traceback and try catching the offending exception type separately, or find what part of the source code and which library is the cause, fix it and submit a PR.您可以查看 Traceback 并尝试单独捕获违规异常类型,或者找到源代码的哪一部分以及哪个库是原因,修复它并提交 PR。 Here are two examples of a good and bad way of deriving own exceptions以下是派生异常的好方法和坏方法的两个示例

class MyBadError(BaseException):
    """
    my bad exception, do not make yours that way
    """
    pass

instead of recommended而不是推荐

class MyGoodError(Exception):
    """
    exception based on the Exception
    """
    pass

Where and what exactly fails is a bit of mystery still but the problems with your exception from the Traceback is not new, see zipfile.BadZipfile issue in pandas discussion .究竟失败的地方和原因仍然有点神秘,但您的 Traceback 异常问题并不新鲜,请参阅pandas 讨论中的 zipfile.BadZipfile 问题 Note that xlrd used by pandas to read Excel workbooks data is currently a "no-maintainer-ware" declaration about xlrd from the authors and in case of any issues the recommendation is to use openpyxl instead or fix any issues yourself (pandas maintainers are doing pontius pilate on that, but happily use xlrd as a dependency).请注意,pandas 用于读取 Excel 工作簿数据的 xlrd 目前是作者关于 xlrd 的“无维护软件”声明,如果出现任何问题,建议使用openpyxl代替或自己修复任何问题(熊猫维护者正在做pontius pilate 对此有兴趣,但很乐意使用 xlrd 作为依赖项)。 I suggest you catch the BadZipfile as a special known corruption error separately from all other exceptions, see python error handling tutorial for example code (you probably already have seen it, this is for other readers).我建议您将 BadZipfile 作为一个特殊的已知损坏错误与所有其他异常分开捕获,请参阅python 错误处理教程示例代码(您可能已经看过它,这是给其他读者的)。 If that does not work I can trace it in the source code of your libraries / python modules to the exact offending section and find the culprit, if you reach out directly.如果这不起作用,我可以在您的库/python 模块的源代码中追踪到确切的违规部分并找到罪魁祸首,如果您直接联系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Openpyxl:我们发现一些内容有问题 - Openpyxl: We found a problem with some content 错误当我们使用python从Excel读取时 - Error When we read from excel using python Python 读入许多文件做一些任务 output 到(Excel 文件) - Python read in many files do some tasks output to (Excel file) 如何读取一个excel文件并将内容转换为python中的列表列表? - how to read an excel file and convert the content to a list of lists in python? python如何读取txt文件内容保存到excel - How does python read the content of txt file and save it in excel Python文件读取问题 - Python file read problem python - 写入 txt 文件。 问题:有些文件名没问题,有些文件名出错 - python - write to txt file. the problem: some file names are ok, some file names get error Selenium python 未从 excel 中读取数据后,我将一些新的登录数据添加到 ZBF57C906FA7D2BB856D07 文件72E41 - Selenium python not read data from excel after i add some new login data into the excel file 我们如何在 excel 和 csv 文件中自动跳过行,直到 header 行在 Z23EEEB4347BDD26BFCDDpyB7EE9 中找到 - How can we skip lines automatically in excel and csv file until header row is found in python (pyspark) python读取excel文件并从excel中保存带有标题和内容的N个txt文件 - python read excel file and save N txt files with title and content from excel
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM