简体   繁体   English

OpenPyXL - 只读:如何在不知道何时发生的情况下跳过空行?

[英]OpenPyXL - ReadOnly: How to skip empty rows without knowing when they occur?

i'm pretty new to programming so please bear with me if my code is not nice and the answer is too obvious.我对编程很陌生,所以如果我的代码不好并且答案太明显,请多多包涵。 :) :)

I want to parse an excel file into a directory so i can later access them via key.我想将 excel 文件解析到一个目录中,以便以后可以通过密钥访问它们。 I won't know how the excel file will be structured before parsing it.在解析之前,我不知道 excel 文件的结构。 So I can't just code it that way to skip a certain empty row since they will be random.所以我不能只是这样编码来跳过某个空行,因为它们是随机的。 For this, i am using Python 3 and OpenPyXl (Read Only).为此,我使用 Python 3 和 OpenPyXl(只读)。 This is my code:这是我的代码:

from openpyxl import load_workbook
import pprint


# path to file
c = "test.xlsx"
wb = load_workbook(filename=c, read_only=True, data_only=True)

# key for directory
data = {}
# list of worksheet names
wsname = []
# values in rows per worksheet
valuename = []


# took this odd numbers since pprint organizes the numbers weird when 1s and 10s are involved
# counter for row
k = 9
# counter for column
i = 10

# splits name of xlsx - file from .xlsx
workbook = c.split(".")[0]

data[workbook] = {}
for ws in wb.worksheets:
    # takes worksheet name and parses it into the wsname list
    wsname.append(ws.title)
    wsrealname = wsname.pop()
    worksheet = wsrealname
    data[workbook][worksheet] = {}
    for row in ws.rows:
        k += 1
        for cell in row:
            # reads value per row and column
            data[workbook][worksheet]["Row: " + str(k) + " Column: " + str(i)] = cell.value
            i += 1
        i = 10
    k = 9

pprint.pprint(data)

And with this i get output like this:有了这个我得到 output 像这样:

    {'test': {'Worksheet1': {'Row: 10 Column: 10': None,
                             'Row: 10 Column: 11': None,
                             'Row: 10 Column: 12': None,
                             'Row: 10 Column: 13': None,
                             'Row: 11 Column: 10': None,
                             'Row: 11 Column: 11': 'Test1',
                             'Row: 11 Column: 12': None,
                             'Row: 11 Column: 13': None}}}

Which is the Output i want, despite the fact they i want to skip in this example the whole Row 10, since all values are None and therefore empty.这是我想要的 Output,尽管事实上我想在这个例子中跳过整个第 10 行,因为所有值都是 None 因此为空。

As mentioned, I don't know when empty rows will occur so I can't just hardcode a certain row to be skipped.如前所述,我不知道什么时候会出现空行,所以我不能硬编码要跳过的某一行。 In Read Only Mode, if you print(row) there will be just 'EmptyCell' in the row like this:在只读模式下,如果您打印(行),则行中将只有“EmptyCell”,如下所示:

(<EmptyCell>, <EmptyCell>, <EmptyCell>, <EmptyCell>)

I tried to let my program check with set() whether there are duplicates in the row "values".我试图让我的程序用 set() 检查“值”行中是否有重复项。

if len(set(row)) == 1:
.....

but that doesn't solve this issue, since I get this Error Message:但这并不能解决此问题,因为我收到此错误消息:

TypeError: unhashable type: 'ReadOnlyCell'

If I compare the cell.value with 'None' and exlude all 'Nones', I get this Output:如果我将 cell.value 与“无”进行比较并排除所有“无”,我会得到这个 Output:

{'test': {'Worksheet1': {'Row: 11 Column: 11': 'Test1'}}}

which is not beneficial, since I just want just to skip cells if the whole row is empty.这是没有好处的,因为如果整行为空,我只想跳过单元格。 Output should be like that: Output 应该是这样的:

{'test': {'Worksheet1': {'Row: 11 Column: 10': None,
                     'Row: 11 Column: 11': 'Test1',
                     'Row: 11 Column: 12': None,
                     'Row: 11 Column: 13': None}}}

So, could you please help in figuring out how to skip cells only if the complete row (and therefore all cells) is empty?那么,您能否帮助弄清楚仅当完整行(以及所有单元格)为空时如何跳过单元格?

Thanks a lot!非常感谢!

from openpyxl.cell.read_only import EmptyCell

for row in ws:
     empty = all(isinstance(cell, EmptyCell) for cell in row) # or check if the value is None

NB. 注意 in read-only mode avoid multiple calls like data[workbook][worksheet]['A1'] as they will force the library to parse the worsheet again and again 在只读模式下,请避免多次调用,例如data[workbook][worksheet]['A1']因为它们将迫使库一次又一次地解析data[workbook][worksheet]['A1']

Just create your custom generator which would yield only not empty rows:只需创建您的自定义生成器,它只会产生非空行:

def iter_rows_with_data(worksheet):
    for row in worksheet.iter_rows(values_only=True):
        if any(row):
            yield row

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不知道哪里会发生错误的情况下将输出和错误记录到文件中 - How to log outputs and errors to file without knowing where errors will occur 当 engine='openpyxl' 时 read_excel 正在读取空行 - read_excel is reading empty rows when engine='openpyxl' 如何在发生未处理的异常时跳过sys.exitfunc - How to skip sys.exitfunc when unhandled exceptions occur 如何跳过特定列中的第一行和之后的所有空行? - How to skip first rows and all empty rows after that in specific column? 如何在不知道哪些行的情况下在多列中使用 NaN 行 select 行? - How to select rows with NaN in multiple columns without knowing which ones? OpenPyXL 遍历行直到找到一个空行 - OpenPyXL traverses rows until find an empty row 如何跳过此代码中的错误? (涉及 Tkinter 和 openpyxl) - how to skip an error in this code?? (Tkinter and openpyxl involved) 使用openpyxl在excel工作表中打印python数据帧时如何跳过默认数据帧索引 - How to skip the default data frame index, when printing a python data frame in an excel worksheet, using openpyxl 使用Openpyxl将内容粘贴到Excel中时,如何跳过一个或多个单元格? - How to skip one or more cells when pasting content into an Excel by using Openpyxl? 如何使用 pandas 和 openpyxl 持续更新特定列中的空行 - How to continuously update the empty rows within specific columns using pandas and openpyxl
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM