[英]Iterate through excel files and sheets and concatenate in Python
假設我有一個文件夾,其中包含多個擴展名為xlsx
或xls
excel 文件,它們共享相同的標題列a, b, c, d, e
除了幾個文件中的一些空表。
我想迭代所有文件和工作表(空工作表除外)並將它們連接成一個文件output.xlsx
一張紙。
我已經遍歷了所有 excel 文件並將它們附加到一個文件中,但是如果每個文件的所有表都不止一張,我怎么能遍歷它們呢?
我需要將下面的兩個代碼塊集成為一個。 謝謝你的幫助。
import pandas as pd
import numpy as np
import glob
path = os.getcwd()
files = os.listdir(path)
files
df = pd.DataFrame()
# method 1
excel_files = [f for f in files if f[-4:] == 'xlsx' or f[-3:] == 'xls']
excel_files
for f in excel_files:
data = pd.read_excel(f)
df = df.append(data)
# method 2
for f in glob.glob("*.xlsx" or "*.xls"):
data = pd.read_excel(f)
df = df.append(data, ignore_index=True)
# save the data frame
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer, 'sheet1')
writer.save()
對於一個文件連接多個工作表:
file = pd.ExcelFile('file.xlsx')
names = file.sheet_names # read all sheet names
df = pd.concat([file.parse(name) for name in names])
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files
excel_files = [file for file in files if '.xls' in file]
excel_files
def create_df_from_excel(file_name):
file = pd.ExcelFile(file_name)
names = file.sheet_names
return pd.concat([file.parse(name) for name in names])
df = pd.concat(
[create_df_from_excel(xl) for xl in excel_files]
)
# save the data frame
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer, 'sheet1')
writer.save()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.