简体   繁体   English

使用 openpyxl 在 excel 工作表中查找隐藏列时缺少列

[英]Columns missing when looking for hidden columns in excel worksheet using openpyxl

I am trying to read only non-hidden columns in an excel worksheet and create a dataframe using the same.我正在尝试仅读取 excel 工作表中的非隐藏列,并使用相同的内容创建 dataframe。 Working with both pandas and openpyxl.使用 pandas 和 openpyxl。

Openpyxl does not find consecutive hidden columns when using column_dimension. Openpyxl 在使用 column_dimension 时没有找到连续的隐藏列。 If there is grouping in creating the hidden state, only the first hidden column is returned.如果在创建隐藏 state 时有分组,则只返回第一个隐藏列。 For example if columns E and F are hidden as a group then, E has hidden set to true and F is missing from a list of columns.例如,如果列 E 和 F 作为一个组被隐藏,则 E 已隐藏设置为 true,并且列列表中缺少 F。 So, what I have done is taken the difference between all possible columns and all cols in the sheet, hence getting the missing hidden columns.因此,我所做的是获取工作表中所有可能的列和所有列之间的差异,从而得到丢失的隐藏列。 It is then concatenated with those having the hidden state to get 'all' hidden columns.然后将其与具有隐藏 state 的那些连接以获得“所有”隐藏列。

But what is happening is some of the columns which are not in the column_dimensions are not showing as hidden in the actual excel sheet.但是正在发生的事情是一些不在 column_dimensions 中的列在实际的 excel 表中没有显示为隐藏。 Not sure how to get a list of the real 'hidden' columns only.不知道如何仅获取真正的“隐藏”列的列表。

Here is the code I have written这是我写的代码

# reading the file and worksheet
wb = load_workbook(path, read_only = False) 
ws = wb['Overview']

mx_col = ws.max_column
col_indx =[]
for i in range(1,mx_col+1):
    num = get_column_letter(i)
    col_indx.append(num)

hid_cols = []
col_vals = []
for col, dimension in ws.column_dimensions.items():
    col_vals.append(col)
    if dimension.hidden:
        hid_cols.append(col)
        
diff = list(set(col_indx) - set(col_vals))
hidden_columns = diff+hid_cols

It appears that ws.column_dimensions.items() is not returning a complete list of all the columns in the sheet.似乎ws.column_dimensions.items()没有返回工作表中所有列的完整列表。 I was able to find all hidden columns by iterating through all columns in the sheet and testing if the column is hidden.通过遍历工作表中的所有列并测试该列是否隐藏,我能够找到所有隐藏的列。 As explained in this answer , Excel merges the cell definitions of the grouped columns, but you can use the max attribute to find the last column in that group.本答案中所述, Excel 合并了分组列的单元格定义,但您可以使用max属性查找该组中的最后一列。 Therefore, once a hidden column has been found, you can easily find the rest of that group using the max attribute.因此,一旦找到隐藏列,您可以使用max属性轻松找到该组的 rest。

import openpyxl as op
from openpyxl.utils import get_column_letter

wb = op.load_workbook("Date_format.xlsx")
ws = wb["Sheet1"]

max_col = ws.max_column
cols = [get_column_letter(i) for i in range(1, max_col+1)]

# Find hidden columns
hidden_cols = []
last_hidden = 0
for i, col in enumerate(cols):
    # Column is hidden
    if ws.column_dimensions[col].hidden:
        hidden_cols.append(col)
        # Last column in the hidden group
        last_hidden = ws.column_dimensions[col].max
    # Appending column if more columns in the group
    elif i+1 <= last_hidden:
        hidden_cols.append(col)
visible_cols = [col for col in cols if col not in hidden_cols]
print("Columns:\t\t", cols)
print("Hidden columns:\t", hidden_cols)
print("Visible columns:", visible_cols)

>>>
Columns:         ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
Hidden columns:  ['B', 'D', 'E', 'G', 'H', 'I']
Visible columns: ['A', 'C', 'F', 'J']

Or a more nested version of the for loop (which is less Pythonic):或者更嵌套的 for 循环版本(不那么 Pythonic):

for i, col in enumerate(cols):
    if ws.column_dimensions[col].hidden:
        for col_num in range(ws.column_dimensions[col].min, ws.column_dimensions[col].max + 1):
            hidden_cols.append(get_column_letter(col_num))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM