简体   繁体   中英

Bug in writing Excel data to Dictionary in Python with OpenPyXL

Sorry to hit you with a bunch of esoteric code, but I have come across a bug which I have no idea how to fix.

Basically, I want to read individual cells in spreadsheet columns and write their data to a corresponding dictionary (called dataSet).

I created a function to do this:

def loopCol(col, start_offset, write_list):
'''
Loop through 1 column (col) until it ends. Skip header by start_offset.
Write data to list within DataSet dict
'''
from openpyxl.utils import column_index_from_string

# Create list and capture str of its name
list_string = str(write_list)
print(list_string)
if list_string not in dataSet:
    raise KeyError('List name not within DataSet Keys')
write_list = []

# Loop through column, capturing each cell's value
# Use start_offset to skip header cells
for i in range(column_index_from_string(col) + start_offset,
               sheet.max_row + 1):
    listItem = sheet[col + str(i)].value
    print(listItem)
    if listItem != None:
        if isinstance(listItem, datetime.datetime):
            listItem = listItem.strftime('%d/%m/%Y')
            write_list.append(listItem)
        else:               
            write_list.append(listItem)

# Write data to dataSet
for list_index in write_list:
    dataSet[list_string] = [list_index for list_index in write_list]

loopCol('A', 0, 'dates')
loopCol('B', 0, 'ph')
loopCol('C', 0, 'water_level')
loopCol('D', 0, 'salinity')
loopCol('E', 1, 'conductivity')
loopCol('F', 0, 'tds')

So in theory, this should go through all the cells in one column, and if there's some value in them, write that value to its corresponding place in this dictionary:

dataSet = {
'dates': [],
'ph': [],
'water_level': [],
'salinity': [],
'conductivity': [],
'tds': []
}

However, there's a problem. When all is said and done the dictionary looks like:

{'ph': [3.4, 2.1, 7], 'salinity': [2.2, 1.2], 'conductivity': [5.3], 'water_level': ['2m', '3m', '1m'], 'tds': [], 'dates': ['Date', '21/01/2016', '28/01/2012', '06/03/2012']}

Now I know there's exactly 3 cells with values in each column. However some aren't making it into the dictionary. 'salinity' only gets 2 values, 'conductivity' only gets one and 'tds' is empty. These do happen to be the last entries in the dataSet dict, so maybe that's part of the reason. But I just can't figure out where the bug in the logic is.

这是上下文文件的屏幕

Can someone please help? I really want to impress my boss ;) (I don't work in IT, so any computer wizardry that makes peoples life easier is met with wonder and awe).

If I didn't do well enough to explain exactly what the code is doing let me know and I'll try to clarify.

You could try something like this:

def colValues(sheet, keys, offsets=None):
    if offsets is None or not len(offsets):
        # Set offsets default to 0
        offsets = {k: 0 for k in keys}
    if len(offsets) != len(keys):
        # If offsets given, fail if length mismatch
        raise AttributeError()

    res = {}
    for column in sheet.columns:
        # Store current column header field (i.e. its name)
        ch = column[0].value
        if ch not in keys:
            # Fail early: No need for any tests if this column's data
            # is not desired in result set.
            continue
        # Append all row values to the result dict with respect to the
        # given column offset. Note: Lowest possible row index is 1,
        # because here we assume that header fields are present.
        res[ch] = [c.value for c in column[offsets[keys.index(ch)] + 1:]]
    return res

if __name__ == '__main__':
    xlsx = 'test.xlsx'
    ws = load_workbook(xlsx)['Sheet1']

    ds = colValues(ws, ['foo', 'bar'], [0, 1])
    print(ds)

For my small test, this yields the correct number of items per column. Note, that the key 'bar' has one item less here, because its offset is higher in the above function call.

{u'foo': [2.3, 3.5, 5.6, 7.9], u'bar': [6.2, 3.6, 9L]}

In addition, the code is way lighter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM