简体   繁体   English

根据工作表名称从多个Excel工作簿的for循环中创建数据框?

[英]Create dataframes in for loop from multiple Excel workbooks based on worksheet name?

I have a folder of a few hundred Excel files all organized identically with nine sheets in each workbook. 我有一个包含数百个Excel文件的文件夹,这些文件的组织方式完全相同,每个工作簿中有九张纸。 I am running the following code to iterate over the files and create one dataframe for each worksheet across all workbooks (so dataframe "sheet_a_df" will be sheet "a" from every workbook concatenated into a single dataframe). 我正在运行以下代码来遍历文件,并为所有工作簿中的每个工作表创建一个数据框(因此,将每个工作簿中的数据表“ sheet_a_df”串联在一起成为一个数据框)。

sheet_a_df = pd.DataFrame()
for file in glob.glob('C:\\Users\*.xlsx'):
    df = pd.read_excel(file,sheetname='a')
    sheet_1_df = sheet_1_df.append(df,ignore_index=True).dropna()

sheet_b_df = pd.DataFrame()
for file in glob.glob('C:\\Users\\*.xlsx'):
    df = pd.read_excel(file,sheetname='b')
    sheet_b_df = sheet_b_df.append(df,ignore_index=True).dropna()

# And so on for all nine sheet names...

However, this requires copy and pasting the code nine times (once for each sheet). 但是,这需要将代码复制并粘贴9次(每张纸一次)。

Is there a more appropriate way to do this? 有没有更合适的方法可以做到这一点?

Reviewing this question , I understand dictionaries are the way to go for creating multiple dataframes in a for loop. 回顾这个问题 ,我理解词典是在for循环中创建多个数据帧的一种方法。 I am also trying to name each df according to the worksheet's name . 我还试图根据工作表的名称为每个df命名 I created a list of my sheet names and tried the following code, but am getting a KeyError that simply returns the first sheet's name. 我创建了工作表名称的列表,并尝试了以下代码,但是得到了一个KeyError,该错误仅返回了第一个工作表的名称。

sheet_names = ['a',
               'b',
               'c',
               ...,]

df_dict = {}

for file in glob.glob('C:\\Users\*.xlsx'):
    for sheet in sheet_names:
        df = pd.read_excel(file,sheetname=sheet)
        df_dict[sheet] = df_dict[sheet].append(df)

Is there a way to fix the above code to create all nine dfs while naming them according to the sheets they come from? 有没有一种方法可以修复以上代码,以创建所有九个df,同时根据它们来自的工作表命名它们?

You are trying to append a dataframe to a non-existent dictionary item. 您正在尝试将数据框追加到不存在的字典项中。 You should first check if the key exists: 您应该首先检查密钥是否存在:

for file in glob.glob('C:\\Users\*.xlsx'):
    for sheet in sheet_names:
        df = pd.read_excel(file,sheetname=sheet)
        if sheet in df_dict:
            df_dict[sheet] = df_dict[sheet].append(df)
        else:
            df_dict[sheet] = df

You can take advantage of the fact that if you pass a list of sheet names to the sheetname parameter of the pd.read_excel function, it will return a dictionary of dataframes where the keys are the sheet names and the values are the dataframes corresponding to those sheet names. 您可以利用以下事实:如果将工作表名称list传递给pd.read_excel函数的sheetname参数,它将返回一个数据帧字典,其中键是工作表名称,值是与那些对应的数据帧工作表名称。 As a result, the following should get you a dictionary of concatenated dataframes: all "a" dataframes together, all "b" dataframes together, so on. 结果,以下内容将为您提供串联数据帧的字典:所有“ a”数据帧在一起,所有“ b”数据帧在一起,依此类推。

sheet_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
data = {}
for fn in glob.glob('C:\\Users\*.xlsx'):
    dfs = pd.read_excel(fn, sheetname=sheet_names)
    for k in dfs:
        data.setdefault(k, pd.DataFrame())
        data[k] = pd.concat([data[k], dfs[k]])

Now data should be a dictionary of dataframes with keys containing elements from sheet_names . 现在, data应该是数据帧的字典,其中的键包含sheet_names中的元素。 Its values are the concatenated dataframes of corresponding sheet names from your files. 它的值是文件中相应工作表名称的串联数据框。

I hope this helps. 我希望这有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多个工作簿中的单个Excel工作表捕获到熊猫数据框中,并将其保存 - Grabbing a single Excel worksheet from multiple workbooks into a pandas dataframe and saving this 在 for 循环之外的一个 excel 工作表中保存多个熊猫数据帧 - Saving multiple panda dataframes in one excel worksheet out of a for loop 在循环中创建多个数据帧并写入excel - create multiple dataframes in loop and write to excel 如何从基于两个现有数据帧和匹配的 creteria 的循环创建多个数据帧? - How do I create multiple dataframes from a loop based on two existing dataframes and matching creteria? 循环从列表创建多个数据框 - Loop to create multiple Dataframes from a list 在循环中创建多个数据帧 - Create multiple dataframes in loop 使用 for 循环创建多个 DataFrame - Create multiple DataFrames with a for loop 使用 for 循环使用 dataframe 名称、来自多个数据帧的行数和列数创建一个新的 dataframe - Create a new dataframe with dataframe name, number of rows and columns from multiple dataframes using for loop 如何使用Python合并来自不同工作簿的多个同名excel工作表并保存到新的excel工作簿中 - How to merge multiple excel sheet with the same name from different workbooks and save into a new excel workbook using Python 将多个数据框导出到多个工作簿 - Export multiple dataframes to multiple workbooks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM