This seems tricky for me. Let's say I have, nested in a directory tree, an excel file with a few non-empty columns. I want to get the sum of all values located in column F with openpyxl
:
file1.xlsx
A B C D E F
5
7
11
17
20
29
34
My take on it would be as follows, but it is wrong:
import os
from openpyxl import load_workbook
directoryPath=r'C:\Users\MyName\Desktop\MyFolder' #The main folder
os.chdir(directoryPath)
folder_list=os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath): #Traversing the sub folders
for name in file:
if name.endswith(".xlsx"):
filename = os.path.join(folders, name)
wb=load_workbook(filename, data_only=True)
ws=wb.active
cell_range = ws['F1':'F7'] #Selecting the slice of interest
sumup=0
for row in cell_range:
sumup=sumup+cell.value
While running this I get NameError: name 'cell' is not defined
. How to work around this?
The main thing currently wrong is that you are only iterating through the rows, not the columns(cells) within that row.
At the end of your code, you can do this (Replace the two end lines of your code):
for row in cell_range: # This is iterating through rows 1-7
for cell in row: # This iterates through the columns(cells) in that row
value = cell.value
sumup += value
You identified that you didn't think this was running through each of your excel files. This would have been very easy to debug. Remove all code after
ws=wb.active
And add
print(name + ' : ' + ws)
This would have printed out all of the excel file names, and their active sheet. If it prints out more than 1, then it's obviously crawling through and grabbing the excel files...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.