简体   繁体   English

如何使用 openpyxl 正确读取 excel 文件?

[英]how to properly read excel files with openpyxl?

I am trying to read all .xlsx files for an specific directory, the idea is to load every excel spreadsheet for every directory files, store them as a .pandas dataframe and then store each spreadsheet for all reports as a dictionary.我正在尝试读取特定目录的所有.xlsx文件,想法是为每个目录文件加载每个 excel 电子表格,将它们存储为.pandas Z6A8064B5DF479455500553C47C5505,然后将所有报告存储为字典。

In my attempt, the error BadZipFile: File is not a zip file keeps arising,在我的尝试中,错误BadZipFile: File is not a zip file不断出现,

What am I missing?我错过了什么?

Read Reports and concatenate for every excel sheet :阅读报告并连接每个 excel 表

import openpyxl
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook

############### path settlement and file names ##########
path_reportes = 'Reports/xlsx_folder'
file_names = os.listdir(path_reportes)
overall_df = dict()

############## concatenate all reports ##################

for file_name in file_names:
    data_file_path = os.path.join(path_reportes, file_name)
    
    # Start by opening the spreadsheet and selecting the main sheet
    workbook = load_workbook(filename=data_file_path)
    #sheet = workbook.active

    # Save the spreadsheet
    workbook.save(filename=data_file_path)
    df_report_dict = pd.read_excel(data_file_path, sheet_name=None, engine='openpyxl')
    
    for key in df_report_dict:
        df_report_dict[key]['report_name'] = file_name
        try:
            overall_df[key] = overall_df[key].append(df_report_dict[key], ignore_index=True)
        except:
            overall_df[key] = df_report_dict[key]

Which output the next error:其中output下一个错误:

BadZipFile: File is not a zip file BadZipFile:文件不是 zip 文件

Full traceback error :完整回溯错误

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-6-5e32988240ae> in <module>
     10 
     11     # Start by opening the spreadsheet and selecting the main sheet
---> 12     workbook = load_workbook(filename=data_file_path)
     13     #sheet = workbook.active
     14 

/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in load_workbook(filename, read_only, keep_vba, data_only, keep_links)
    314     """
    315     reader = ExcelReader(filename, read_only, keep_vba,
--> 316                         data_only, keep_links)
    317     reader.read()
    318     return reader.wb

/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in __init__(self, fn, read_only, keep_vba, data_only, keep_links)
    122     def __init__(self,  fn, read_only=False, keep_vba=KEEP_VBA,
    123                   data_only=False, keep_links=True):
--> 124         self.archive = _validate_archive(fn)
    125         self.valid_files = self.archive.namelist()
    126         self.read_only = read_only

/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py in _validate_archive(filename)
     94             raise InvalidFileException(msg)
     95 
---> 96     archive = ZipFile(filename, 'r')
     97     return archive
     98 

/usr/local/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
   1129         try:
   1130             if mode == 'r':
-> 1131                 self._RealGetContents()
   1132             elif mode in ('w', 'x'):
   1133                 # set the modified flag so central directory gets written

/usr/local/lib/python3.6/zipfile.py in _RealGetContents(self)
   1196             raise BadZipFile("File is not a zip file")
   1197         if not endrec:
-> 1198             raise BadZipFile("File is not a zip file")
   1199         if self.debug > 1:
   1200             print(endrec)

BadZipFile: File is not a zip file

I tried to replicate your experiment and created dummy excel files and after little modification to your code, I didn't get your error.我试图复制您的实验并创建了虚拟 excel 文件,在对您的代码稍作修改后,我没有收到您的错误。 try the code bellow to see if it solve your problem:试试下面的代码,看看它是否能解决你的问题:

import openpyxl
from openpyxl import Workbook
import pandas as pd
from openpyxl import load_workbook
import os

############### path settlement and file names ##########
path_reportes = os.path.join(os.getcwd(), 'Reports', 'xlsx_folder')
file_names = os.listdir(path_reportes)
overall_df = dict()

############## concatenate all reports ##################

for file_name in file_names:
    data_file_path = os.path.join(path_reportes, file_name)
    
    # Start by opening the spreadsheet and selecting the main sheet
    workbook = load_workbook(filename=data_file_path)
    #sheet = workbook.active

    # Save the spreadsheet
    workbook.save(filename=data_file_path)
    df_report_dict = pd.read_excel(data_file_path, sheet_name=None, engine='openpyxl')
    
    for key in df_report_dict:
        df_report_dict[key]['report_name'] = file_name
        try:
            overall_df[key] = overall_df[key].append(df_report_dict[key], ignore_index=True)
        except:
            overall_df[key] = df_report_dict[key]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM