[英]OpenPyXL - ReadOnly: How to skip empty rows without knowing when they occur?
i'm pretty new to programming so please bear with me if my code is not nice and the answer is too obvious.我对编程很陌生,所以如果我的代码不好并且答案太明显,请多多包涵。 :)
:)
I want to parse an excel file into a directory so i can later access them via key.我想将 excel 文件解析到一个目录中,以便以后可以通过密钥访问它们。 I won't know how the excel file will be structured before parsing it.
在解析之前,我不知道 excel 文件的结构。 So I can't just code it that way to skip a certain empty row since they will be random.
所以我不能只是这样编码来跳过某个空行,因为它们是随机的。 For this, i am using Python 3 and OpenPyXl (Read Only).
为此,我使用 Python 3 和 OpenPyXl(只读)。 This is my code:
这是我的代码:
from openpyxl import load_workbook
import pprint
# path to file
c = "test.xlsx"
wb = load_workbook(filename=c, read_only=True, data_only=True)
# key for directory
data = {}
# list of worksheet names
wsname = []
# values in rows per worksheet
valuename = []
# took this odd numbers since pprint organizes the numbers weird when 1s and 10s are involved
# counter for row
k = 9
# counter for column
i = 10
# splits name of xlsx - file from .xlsx
workbook = c.split(".")[0]
data[workbook] = {}
for ws in wb.worksheets:
# takes worksheet name and parses it into the wsname list
wsname.append(ws.title)
wsrealname = wsname.pop()
worksheet = wsrealname
data[workbook][worksheet] = {}
for row in ws.rows:
k += 1
for cell in row:
# reads value per row and column
data[workbook][worksheet]["Row: " + str(k) + " Column: " + str(i)] = cell.value
i += 1
i = 10
k = 9
pprint.pprint(data)
And with this i get output like this:有了这个我得到 output 像这样:
{'test': {'Worksheet1': {'Row: 10 Column: 10': None,
'Row: 10 Column: 11': None,
'Row: 10 Column: 12': None,
'Row: 10 Column: 13': None,
'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
Which is the Output i want, despite the fact they i want to skip in this example the whole Row 10, since all values are None and therefore empty.这是我想要的 Output,尽管事实上我想在这个例子中跳过整个第 10 行,因为所有值都是 None 因此为空。
As mentioned, I don't know when empty rows will occur so I can't just hardcode a certain row to be skipped.如前所述,我不知道什么时候会出现空行,所以我不能硬编码要跳过的某一行。 In Read Only Mode, if you print(row) there will be just 'EmptyCell' in the row like this:
在只读模式下,如果您打印(行),则行中将只有“EmptyCell”,如下所示:
(<EmptyCell>, <EmptyCell>, <EmptyCell>, <EmptyCell>)
I tried to let my program check with set() whether there are duplicates in the row "values".我试图让我的程序用 set() 检查“值”行中是否有重复项。
if len(set(row)) == 1:
.....
but that doesn't solve this issue, since I get this Error Message:但这并不能解决此问题,因为我收到此错误消息:
TypeError: unhashable type: 'ReadOnlyCell'
If I compare the cell.value with 'None' and exlude all 'Nones', I get this Output:如果我将 cell.value 与“无”进行比较并排除所有“无”,我会得到这个 Output:
{'test': {'Worksheet1': {'Row: 11 Column: 11': 'Test1'}}}
which is not beneficial, since I just want just to skip cells if the whole row is empty.这是没有好处的,因为如果整行为空,我只想跳过单元格。 Output should be like that:
Output 应该是这样的:
{'test': {'Worksheet1': {'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
So, could you please help in figuring out how to skip cells only if the complete row (and therefore all cells) is empty?那么,您能否帮助弄清楚仅当完整行(以及所有单元格)为空时如何跳过单元格?
Thanks a lot!非常感谢!
from openpyxl.cell.read_only import EmptyCell
for row in ws:
empty = all(isinstance(cell, EmptyCell) for cell in row) # or check if the value is None
NB. 注意 in read-only mode avoid multiple calls like
data[workbook][worksheet]['A1']
as they will force the library to parse the worsheet again and again 在只读模式下,请避免多次调用,例如
data[workbook][worksheet]['A1']
因为它们将迫使库一次又一次地解析data[workbook][worksheet]['A1']
Just create your custom generator which would yield only not empty rows:只需创建您的自定义生成器,它只会产生非空行:
def iter_rows_with_data(worksheet):
for row in worksheet.iter_rows(values_only=True):
if any(row):
yield row
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.