简体   繁体   English

无法读取excel文件,使用openpyxl

[英]Can't read excel files, using openpyxl

I have a list of excel files with similar last row.我有一个最后一行类似的 excel 文件列表。 It contains private information about client (his name, surname, phone).它包含有关客户的私人信息(他的名字、姓氏、电话)。 Each excel file corresponds to a client.每个excel文件对应一个客户端。 I need to make one excel file with all data about every client.我需要用每个客户的所有数据制作一个 excel 文件。 I decide to do it automatically, so looked to openpyxl library.我决定自动执行此操作,因此查看了openpyxl库。 I wrote the following code, but it doesn't work correctly.我写了下面的代码,但它不能正常工作。

import openpyxl
import os
import glob
from openpyxl import load_workbook
from openpyxl import Workbook
import openpyxl.styles
from openpyxl.cell import get_column_letter

path_kit = 'prize_input/kit'

#creating single document
prize_info = Workbook()
prize_sheet = prize_info.active

file_array_reciever = []

for file in glob.glob(os.path.join(path_kit, '*.xlsx')):
    file_array_reciever.append(file)

row_num = 1
for f in file_array_reciever:
    f1 = load_workbook(filename=f)
    sheet = f1.active
    for col_num in range (3, sheet.max_column):
        prize_sheet.cell(row=row_num, column=col_num).value = \
            sheet.cell(row=sheet.max_row, column=col_num).value

    prize_info.save("Ex.xlsx")

I get this error:我收到此错误:

Traceback (most recent call last):
  File "/Users/zkid18/PycharmProjects/untitled/excel_test.py", line 43, in <module>
    f1 = load_workbook(filename=f)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 183, in load_workbook
    wb.active = read_workbook_settings(archive.read(ARC_WORKBOOK)) or 0
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1229, in read
    with self.open(name, "r", pwd) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1252, in open
    zinfo = self.getinfo(name)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1196, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'xl/workbook.xml' in the archive"

Looks like it is a problem with reading file.看起来这是读取文件的问题。
I don't understand where it gets an item named 'xl/workbook.xml' in the archive.我不明白它在存档中从哪里获取名为'xl/workbook.xml'的项目。

Option 1: I have overcome this issue by adding read_only=True : Specifically, replace选项 1:我通过添加read_only=True克服了这个问题:具体来说,替换

f1 = load_workbook(filename=f) with f1 = load_workbook(filename=f)

f1 = load_workbook(filename=f, read_only=True)

Note: Depending on your code, read_only=True can make your code very slow.注:根据您的代码, read_only=True可以使你的代码非常缓慢。 If this is the case for you, you may want to try option 2.如果您是这种情况,您可能需要尝试选项 2。

Option 2: Open your problematic workbook in excel, and then re-save it as a Strict Open XML Spreadsheet (*.xlsx)选项 2:在 excel 中打开有问题的工作簿,然后将其重新保存为Strict Open XML Spreadsheet (*.xlsx)

Depending on which version you are using, this could be a bug in openpyxl.根据您使用的版本,这可能是 openpyxl 中的错误。 For example, in 1.6.1 a bug was introduced exhibiting this behavior.例如,在 1.6.1 中引入了一个显示此行为的错误。 Reverting to 1.5.8 fixed it.恢复到 1.5.8 修复了它。 There was a fix according to this openpyxl ticket ;根据这个 openpyxl 有一个修复; though the ticket doesn't say when the fix was delivered, it was committed in early 2013. I upgraded to 1.6.2 and the error went away.虽然故障单没有说明修复何时交付,但它是在 2013 年初提交的。我升级到 1.6.2 并且错误消失了。

I found this post searching for a solution to a similar issue, ("There is no item named '[Content_Types].xml' in the archive")我发现这篇文章正在寻找类似问题的解决方案, ("There is no item named '[Content_Types].xml' in the archive")

None of this error message makes any sense in terms of my script or the file.就我的脚本或文件而言,这些错误消息都没有任何意义。 My script adds 1 sheet and updates five more in an existing Excel document.我的脚本在现有 Excel 文档中添加了 1 个工作表并更新了五个工作表。 While my script was running, I realized I had an error in my code.当我的脚本运行时,我意识到我的代码中有一个错误。 I canceled my script mid-running.我在运行中取消了我的脚本。

After canceling, the existing Excel file exhibited this error.取消后,现有的 Excel 文件显示此错误。 Working out bugs with the script, maybe you corrupted your Excel file??使用脚本解决错误,也许您损坏了 Excel 文件?

To address this, I'm thinking of creating a temporary restore file in the event of an error using OpenPyXl.为了解决这个问题,我正在考虑在使用 OpenPyXl 发生错误时创建一个临时恢复文件。

我有同样的问题,请确保您尝试读取的文件尚未在 Excel 中打开

You can use xlrd biblioteque您可以使用 xlrd biblioteque

This script allow you to transform a excel data to list of dictionnaries此脚本允许您将 excel 数据转换为字典列表

import xlrd

workbook = xlrd.open_workbook('your_file.xlsx')
workbook = xlrd.open_workbook('your_file.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data

I guess your file is .xls format before, you can use我猜你的文件以前是 .xls 格式,你可以使用

try:
    f1 = load_workbook(filename=f)
except:
    print f

to find which file cause this error and reopen it in Excel, then save as .xlsx.查找导致此错误的文件并在 Excel 中重新打开它,然后另存为 .xlsx。

If openpyxl still doesn't work, using pandas works.如果 openpyxl 仍然不起作用,则使用 Pandas 有效。

$ pip install pandas xlrd

And this code works:这段代码有效:

import pandas as pd

df = pd.read_excel(file_path)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM