简体   繁体   English

使用 Python 将多个 Excel 文件作为单独的工作表

[英]Multiple Excel Files as Separate Sheets Using Python

Most of the articles I'm seeing either: a) Combine multiple excel single-sheet workbooks into one master workbook with just a single sheet or;我看到的大多数文章都是:a)将多个 excel 单页工作簿合并到一个主工作簿中,只需一张工作表或; b) Split a multiple-sheet excel workbook into individual workbooks. b) 将多页 excel 工作簿拆分为单独的工作簿。

However, my goal is to grab all the excel files in a specific folder and save them as individual sheets within one new master excel workbook.但是,我的目标是获取特定文件夹中的所有 excel 文件,并将它们作为单独的工作表保存在一个新的主 excel 工作簿中。 I'm trying to rename each sheet name as the name of the original file.我正在尝试将每个工作表名称重命名为原始文件的名称。

import pandas as pd
import glob
import os

file = "C:\\File\\Path\\"
filename = 'Consolidated Files.xlsx'
pth = os.path.dirname(file)
extension = os.path.splitext(file)[1]
files = glob.glob(os.path.join(pth, '*xlsx'))

w = pd.ExcelWriter(file + filename)

for f in files:
    print(f)
    df = pd.read_excel(f, header = None)
    print(df)
    df.to_excel(w, sheet_name = f, index = False)
   
w.save()

How do I adjust the names for each sheet?如何调整每张纸的名称? Also, if you see any opportunities to clean this up please let me know另外,如果您看到任何清理此问题的机会,请告诉我

You cannot rename sheet with special characters because f is full path and file name.您不能使用特殊字符重命名工作表,因为f是完整路径和文件名。 You should use only filename to names sheetname, Use os.path.basename to get file name and use split to seperate file name and extension.您应该只使用文件名来命名工作表名,使用os.path.basename来获取文件名并使用split来分隔文件名和扩展名。

for f in files:
    print(f)
    df = pd.read_excel(f, header = None)
    print(df)
    
    # Use basename to get filename with extension
    # Use split to seperate filename and extension
    new_sheet_name = os.path.basename(f).split('.')[0]
    
    # 
    df.to_excel(w, sheet_name = new_sheet_name , index = False)

I decided to put my solution here as well, just in case it would be useful to anyone.我决定也将我的解决方案放在这里,以防万一它对任何人有用。

Thing is, I wanted to be able to recall where the end sheet came from.问题是,我希望能够回忆起最后一张纸的来源。 However, source workbooks can (and likely will) often have same sheet names like "Sheet 1", so I couldn't just use sheet names from original workbooks.但是,源工作簿可以(并且可能会)通常具有相同的工作表名称,例如“工作表 1”,因此我不能只使用原始工作簿中的工作表名称。 I also could not use source filenames as sheet names since they might be longer than 31 character, which is maximum sheet name length allowed by Excel.我也不能使用源文件名作为工作表名称,因为它们可能超过 31 个字符,这是 Excel 允许的最大工作表名称长度。

Therefore, I ended up assigning incremental numbers to resulting sheet names, while simultaneously inserting a new column named "source" at the start of each sheet and populating it with file name concatenated with sheet name.因此,我最终为生成的工作表名称分配了增量编号,同时在每张工作表的开头插入一个名为“源”的新列,并使用与工作表名称连接的文件名填充它。 Hope it might help someone:)希望它可以帮助某人:)

from glob import glob
import pandas as pd
import os

files_input = glob(r'C:\Path\to\folder\*.xlsx')

result_DFs = []

for xlsx_file in files_input:
    file_DFs = pd.read_excel(xlsx_file, sheet_name=None)
    # save every sheet from every file as dataframe to an array
    for sheet_DF in file_DFs:
        source_name = os.path.basename(xlsx_file) + ":" + sheet_DF
        file_DFs[sheet_DF].insert(0, 'source', source_name)
        result_DFs.append(file_DFs[sheet_DF])

with pd.ExcelWriter(r'C:\Path\to\resulting\file.xlsx') as writer:
    for df_index in range(len(result_DFs)):
        # write dataframe to file using simple incremental number as a new sheet name
        result_DFs[df_index].to_excel(writer, sheet_name=str(df_index), index=False)
        # auto-adjust column width (can be omitted if not needed)
        for i, col in enumerate(result_DFs[df_index].columns):
            column_len = max(result_DFs[df_index][col].astype(str).str.len().max(), len(col) + 3)
            _ = writer.sheets[str(df_index)].set_column(i, i, column_len)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM