简体   繁体   English

如何遍历包含多个工作表的多个 Excel 工作簿的文件夹创建单个数据框?

[英]How to iterate through a folder with multiple excel workbooks containing multiple worksheets create a single data frame?

The following function is what I have come up with to iterate through multiple excel files to store the data into a single data frame.以下函数是我想出的,用于遍历多个 excel 文件以将数据存储到单个数据框中。 However, only the data from the final file is being stored in the final data frame.但是,只有来自最终文件的数据被存储在最终数据框中。 What should I do to get the data from all the files to be stored in the same df and then exported to a csv file?我应该怎么做才能将所有文件中的数据存储在同一个 df 中,然后导出到 csv 文件?

def excel_to_df(folder, start_row, end_row, start_col, end_col):
    # loop through all excel files in the folder
    for file in os.listdir(folder):
        exact_file_path = folder +  "\\\\" + file 
        print(exact_file_path)
        # check if file is an excel file
        if file.endswith('xlsx'):
            # create workbook and its worksheets for each file
            wb = openpyxl.load_workbook(exact_file_path)
            ws = wb.worksheets

        # create a list to store the dataframes
        df_list = []

        # iterate over the worksheets
        for worksheet in ws:
            # get the name of the worksheet
            name = worksheet.title
            # create an empty list to store the values
            cell_values = []

            # iterate over the rows and columns in the range

            for row in worksheet.iter_rows(min_row = row_min, max_row = row_max,
                                            min_col = col_min, max_col = col_max):


                                            # append the cell values to the list
                                            cell_values.append([cell.value for cell in row])

                                            # create a dataframe from the cell values and the worksheet name
                                            df = pd.DataFrame(cell_values, columns=range(start_col, end_col+1), index=[name]*len(cell_values))

                                            # append the df to the list
                                            df_list.append(df)


                                            # concatenate the list of dataframes into a single dataframe
                                            df = pd.concat(df_list)
                                            # save the output to a csv file
                                            df.to_csv('test.csv', index=True)

                                            return df

Your immediate problem is that you're creating df_list inside the loop so that each time the loop starts over, it will overwrite whatever was already in it.您的直接问题是您在循环内创建df_list以便每次循环重新开始时,它都会覆盖其中已有的内容。 Additionally, your return is at the end of (and inside) the loop so it doesn't ever get to the second element.此外,您的return位于循环的末尾(和内部),因此它永远不会到达第二个元素。 When it gets to the return it gives you what it has and stops running.当它return时,它会为您提供它所拥有的并停止运行。 You just need to rearrange it, like this:你只需要重新排列它,像这样:

def excel_to_df(folder, start_row, end_row, start_col, end_col):
    
    # create a list to store the dataframes
    df_list = []
    # loop through all excel files in the folder
    for file in os.listdir(folder):
        exact_file_path = folder +  "\\\\" + file 
        print(exact_file_path)
        # check if file is an excel file
        if file.endswith('xlsx'):
            # create workbook and its worksheets for each file
            wb = openpyxl.load_workbook(exact_file_path)
            ws = wb.worksheets
        else:
            # if the file doesn't end with xlsx then don't try to open it as though it is
            next
        # iterate over the worksheets
        for worksheet in ws:
            # get the name of the worksheet
            name = worksheet.title
            # create an empty list to store the values
            cell_values = []

            # iterate over the rows and columns in the range

            for row in worksheet.iter_rows(min_row = row_min, max_row = row_max,
                                            min_col = col_min, max_col = col_max):


                                            # append the cell values to the list
                                            cell_values.append([cell.value for cell in row])

            # create a dataframe from the cell values and the worksheet name
            df = pd.DataFrame(cell_values, columns=range(start_col, end_col+1), index=[name]*len(cell_values))

            # append the df to the list
            df_list.append(df)


    # concatenate the list of dataframes into a single dataframe
    df = pd.concat(df_list)
    # save the output to a csv file
    df.to_csv('test.csv', index=True)

    return df

As an aside, is there a reason you're manually creating a DF instead of just using pd.read_excel ?顺便说一句,您是否有理由手动创建 DF 而不是仅使用pd.read_excel If not, I'd recommend getting rid of your for row loop and just use pd.read_excel(exact_file_path, sheet_name=worksheet.title)如果没有,我建议摆脱你的for row循环,只使用pd.read_excel(exact_file_path, sheet_name=worksheet.title)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Python 3将多个Excel工作簿和工作表导入到单个数据框中 - Using Python 3 to import multiple excel workbooks and sheets into single data frame 如何将 Excel 工作表保存在单个工作簿中作为工作簿及其数据 - How can I save Excel worksheets in a single workbook as workbooks with their data 如何遍历多个Excel工作表并使用python排序数据? - How to loop through multiple excel worksheets and sort data using python? 在 R 中合并 12 个 excel 工作簿的数据(每个包含 3 个工作表) - Data Merging in R for 12 excel workbooks (each containing 3 worksheets) Excel 和 Python:合并 + Append 具有多个工作表的多个工作簿 - Excel and Python: Merge + Append multiple workbooks with multiple worksheets xlrd循环浏览文件夹中的多个工作簿 - xlrd to loop through multiple workbooks in a folder 如何在一个数据框中迭代多个标签? - How to iterate multiple labels in a data frame? 如何遍历数据框单列中的行? - how to iterate through rows within single column of data frame? 将多个Excel工作簿中的多个工作表合并到一个Pandas数据框中 - Merge multiple sheets from multiple Excel workbooks into a single Pandas dataframe 将多个工作簿中的单个Excel工作表捕获到熊猫数据框中,并将其保存 - Grabbing a single Excel worksheet from multiple workbooks into a pandas dataframe and saving this
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM