Python Pandas - 按空行拆分 Excel 电子表格

Question

Given the following input file ("ToSplit2.xlsx"):给定以下输入文件（“ToSplit2.xlsx”）：

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section Two     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section   Three |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

And the following Python code:以及以下 Python 代码：

import pandas as pd
import numpy as np

spreadsheetPath = "ToSplit2.xlsx"
xls = pd.ExcelFile(spreadsheetPath)

# Iterate through worksheets in opened Excel file
for sheet in xls.sheet_names:
    # Create a Pandas dataframe from the Excel worksheet (with no headers)
    excel_data_df = pd.read_excel(
        spreadsheetPath, sheet_name=sheet, header=None)

    # Return a list of dataframe index values where entire row is blank
    indexList = excel_data_df[excel_data_df.isnull().all(1)].index.tolist()

    # Prints [11, 23]
    print(indexList)

    # Initiate a dictionary
    dataframeDictionary = {}

    # For every index value in the list
    for index in indexList:
        # Split and add the result to the dictionary of Panda's dataframes
        dataframeDictionary = np.array_split(excel_data_df, index)

    # For every pandas dataframe in the dataframe dictionary
    for dataframe in dataframeDictionary:
        # Write the pandas dataframe to Excel with a worksheet name equal to dataframe address 0,0
        dataframe.to_excel("output.xlsx",sheet_name=str(dataframe.iloc[0][0]))

I am trying to split the Excel worksheet into multiple spreadsheets based on the blank rows.我正在尝试根据空白行将 Excel 工作表拆分为多个电子表格。 Eg:例如：

Section One: (there would also be Section Two and Section Three worksheets)第一节：（还有第二节和第三节工作表）

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

I believe I am really close, but seem to be slipping up on the data frame splitting.我相信我真的很接近，但似乎在数据帧拆分方面有所失误。

Answer 1

Make changes according to your file name.根据您的文件名进行更改。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read excel file
df = pd.read_excel('ToSplit2.xlsx', skip_blank_lines=False, header=None)

# Split by blank rows
df_list = np.split(df, df[df.isnull().all(1)].index)

# Create new excel to write the dataframes
writer = pd.ExcelWriter('Excel_one.xlsx', engine='xlsxwriter')
for i in range(1, len(df_list) + 1):
    df_list[i - 1] = df_list[i - 1].dropna(how='all')
    df_list[i - 1].to_excel(writer, sheet_name='Sheet{}'.format(i), header=None, index=False)
    
# Save the excel file
writer.save()

Python Pandas - 按空行拆分 Excel 电子表格

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-09-22 11:55:35

Python Pandas - 按空行拆分 Excel 电子表格

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-09-22 11:55:35

解决方案1
2 已采纳 2020-09-22 11:55:35