简体   繁体   English

使用OpenPyXL在Excel中将Excel数据写入Dictionary的错误

[英]Bug in writing Excel data to Dictionary in Python with OpenPyXL

Sorry to hit you with a bunch of esoteric code, but I have come across a bug which I have no idea how to fix. 很抱歉用一堆深奥的代码打了你,但我遇到了一个不知道如何解决的错误。

Basically, I want to read individual cells in spreadsheet columns and write their data to a corresponding dictionary (called dataSet). 基本上,我想读取电子表格列中的单个单元格并将其数据写入相应的字典(称为dataSet)。

I created a function to do this: 我创建了一个函数来做到这一点:

def loopCol(col, start_offset, write_list):
'''
Loop through 1 column (col) until it ends. Skip header by start_offset.
Write data to list within DataSet dict
'''
from openpyxl.utils import column_index_from_string

# Create list and capture str of its name
list_string = str(write_list)
print(list_string)
if list_string not in dataSet:
    raise KeyError('List name not within DataSet Keys')
write_list = []

# Loop through column, capturing each cell's value
# Use start_offset to skip header cells
for i in range(column_index_from_string(col) + start_offset,
               sheet.max_row + 1):
    listItem = sheet[col + str(i)].value
    print(listItem)
    if listItem != None:
        if isinstance(listItem, datetime.datetime):
            listItem = listItem.strftime('%d/%m/%Y')
            write_list.append(listItem)
        else:               
            write_list.append(listItem)

# Write data to dataSet
for list_index in write_list:
    dataSet[list_string] = [list_index for list_index in write_list]

loopCol('A', 0, 'dates')
loopCol('B', 0, 'ph')
loopCol('C', 0, 'water_level')
loopCol('D', 0, 'salinity')
loopCol('E', 1, 'conductivity')
loopCol('F', 0, 'tds')

So in theory, this should go through all the cells in one column, and if there's some value in them, write that value to its corresponding place in this dictionary: 因此,从理论上讲,这应该遍历一列中的所有单元格,并且如果其中包含某些值,则将该值写入此字典中的相应位置:

dataSet = {
'dates': [],
'ph': [],
'water_level': [],
'salinity': [],
'conductivity': [],
'tds': []
}

However, there's a problem. 但是,有一个问题。 When all is said and done the dictionary looks like: 说完所有的话,字典看起来像:

{'ph': [3.4, 2.1, 7], 'salinity': [2.2, 1.2], 'conductivity': [5.3], 'water_level': ['2m', '3m', '1m'], 'tds': [], 'dates': ['Date', '21/01/2016', '28/01/2012', '06/03/2012']}

Now I know there's exactly 3 cells with values in each column. 现在,我知道每列中恰好有3个单元格具有值。 However some aren't making it into the dictionary. 但是,有些人没有将其纳入字典。 'salinity' only gets 2 values, 'conductivity' only gets one and 'tds' is empty. “盐度”仅获得2个值,“电导率”仅获得1个值,“ tds”为空。 These do happen to be the last entries in the dataSet dict, so maybe that's part of the reason. 这些确实是dataSet dict中的最后一个条目,所以也许这就是原因的一部分。 But I just can't figure out where the bug in the logic is. 但是我只是不知道逻辑中的错误在哪里。

这是上下文文件的屏幕

Can someone please help? 有人可以帮忙吗? I really want to impress my boss ;) (I don't work in IT, so any computer wizardry that makes peoples life easier is met with wonder and awe). 我真的很想打动我的老板;)(我不在IT部门工作,所以任何使人们的生活更轻松的计算机向导都充满了惊奇和敬畏)。

If I didn't do well enough to explain exactly what the code is doing let me know and I'll try to clarify. 如果我做得不够好,无法确切解释代码在做什么,请告诉我,我将尽力澄清。

You could try something like this: 您可以尝试这样的事情:

def colValues(sheet, keys, offsets=None):
    if offsets is None or not len(offsets):
        # Set offsets default to 0
        offsets = {k: 0 for k in keys}
    if len(offsets) != len(keys):
        # If offsets given, fail if length mismatch
        raise AttributeError()

    res = {}
    for column in sheet.columns:
        # Store current column header field (i.e. its name)
        ch = column[0].value
        if ch not in keys:
            # Fail early: No need for any tests if this column's data
            # is not desired in result set.
            continue
        # Append all row values to the result dict with respect to the
        # given column offset. Note: Lowest possible row index is 1,
        # because here we assume that header fields are present.
        res[ch] = [c.value for c in column[offsets[keys.index(ch)] + 1:]]
    return res

if __name__ == '__main__':
    xlsx = 'test.xlsx'
    ws = load_workbook(xlsx)['Sheet1']

    ds = colValues(ws, ['foo', 'bar'], [0, 1])
    print(ds)

For my small test, this yields the correct number of items per column. 对于我的小型测试,这将产生每列正确的项目数。 Note, that the key 'bar' has one item less here, because its offset is higher in the above function call. 注意,这里的键'bar'少一个,因为在上面的函数调用中它的偏移量更大。

{u'foo': [2.3, 3.5, 5.6, 7.9], u'bar': [6.2, 3.6, 9L]}

In addition, the code is way lighter. 此外,该代码更轻便。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM