简体   繁体   中英

Columns missing when looking for hidden columns in excel worksheet using openpyxl

I am trying to read only non-hidden columns in an excel worksheet and create a dataframe using the same. Working with both pandas and openpyxl.

Openpyxl does not find consecutive hidden columns when using column_dimension. If there is grouping in creating the hidden state, only the first hidden column is returned. For example if columns E and F are hidden as a group then, E has hidden set to true and F is missing from a list of columns. So, what I have done is taken the difference between all possible columns and all cols in the sheet, hence getting the missing hidden columns. It is then concatenated with those having the hidden state to get 'all' hidden columns.

But what is happening is some of the columns which are not in the column_dimensions are not showing as hidden in the actual excel sheet. Not sure how to get a list of the real 'hidden' columns only.

Here is the code I have written

# reading the file and worksheet
wb = load_workbook(path, read_only = False) 
ws = wb['Overview']

mx_col = ws.max_column
col_indx =[]
for i in range(1,mx_col+1):
    num = get_column_letter(i)
    col_indx.append(num)

hid_cols = []
col_vals = []
for col, dimension in ws.column_dimensions.items():
    col_vals.append(col)
    if dimension.hidden:
        hid_cols.append(col)
        
diff = list(set(col_indx) - set(col_vals))
hidden_columns = diff+hid_cols

It appears that ws.column_dimensions.items() is not returning a complete list of all the columns in the sheet. I was able to find all hidden columns by iterating through all columns in the sheet and testing if the column is hidden. As explained in this answer , Excel merges the cell definitions of the grouped columns, but you can use the max attribute to find the last column in that group. Therefore, once a hidden column has been found, you can easily find the rest of that group using the max attribute.

import openpyxl as op
from openpyxl.utils import get_column_letter

wb = op.load_workbook("Date_format.xlsx")
ws = wb["Sheet1"]

max_col = ws.max_column
cols = [get_column_letter(i) for i in range(1, max_col+1)]

# Find hidden columns
hidden_cols = []
last_hidden = 0
for i, col in enumerate(cols):
    # Column is hidden
    if ws.column_dimensions[col].hidden:
        hidden_cols.append(col)
        # Last column in the hidden group
        last_hidden = ws.column_dimensions[col].max
    # Appending column if more columns in the group
    elif i+1 <= last_hidden:
        hidden_cols.append(col)
visible_cols = [col for col in cols if col not in hidden_cols]
print("Columns:\t\t", cols)
print("Hidden columns:\t", hidden_cols)
print("Visible columns:", visible_cols)

>>>
Columns:         ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
Hidden columns:  ['B', 'D', 'E', 'G', 'H', 'I']
Visible columns: ['A', 'C', 'F', 'J']

Or a more nested version of the for loop (which is less Pythonic):

for i, col in enumerate(cols):
    if ws.column_dimensions[col].hidden:
        for col_num in range(ws.column_dimensions[col].min, ws.column_dimensions[col].max + 1):
            hidden_cols.append(get_column_letter(col_num))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM