简体   繁体   中英

Looping through a directory for .xlsx files appending data from one sheet in each file to a dataframe

I am trying to pull from a list of spreadsheets a specific sheet and specific columns within that sheet - think a monthly report that is structured similarly from month to month outside of a date stamp as part of the file name - ie Metrics 202001.xlsx and so on

I am using openpyxl which after a lot of trial and error is working great. My issue is I want to be able to write those specific columns to a dataframe or.xlsx for summary.

So I am looping through the workbooks and grabbing the sheet I want for each (thankfully all named the same). Where I am getting tripped up is pulling the specific columns and writing them. Here is my code thus far:

import os
import pandas as pd
import openpyxl

path = os.getcwd()
files = os.listdir(path)
print(path)

files_xlsx = [f for f in files if f[-4:] == 'xlsx']
print(files_xlsx)

Sp = pd.DataFrame() #make blank dataframe to fill in
headers = ["Fiscal Month", "Country", "Beginning Balance", "Acquisitions", "Reinstatements", "Terminations", "Delinq"] # fields I want to pull from worksheet within workbook


for f in files_xlsx :
    wb = openpyxl.load_workbook(filename = f)
    ws = wb['Metrics']


for col_cells in ws.iter_cols(min_col=2, max_col=2, max_row= ws.max_row+1): 

    for cell in col_cells:

I would like to dynamically fill the min_col and max_col values rather than hardcoding them. From there I would either write to the dataframe I created or a new excel file. any help would be greatly appreciated as I can see this code having more application than just the project I am working on. Thanks!

Seems like I figured this out. Thanks to the poster who answered a very similar question to this one!!

import os
import pandas as pd

path = os.getcwd()
files = os.listdir(path)
print(path)

files_xlsx = [f for f in files if f[-4:] == 'xlsx']

df = pd.DataFrame()

for f in files_xlsx:
    data = pd.read_excel(f,"Sponsorship Metrics")
    df = df.append(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM