简体   繁体   English

Python Pandas - 按空行拆分 Excel 电子表格

[英]Python Pandas - Split Excel Spreadsheet By Empty Rows

Given the following input file ("ToSplit2.xlsx"):给定以下输入文件(“ToSplit2.xlsx”):

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section Two     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
|           |     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section   Three |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

And the following Python code:以及以下 Python 代码:

import pandas as pd
import numpy as np

spreadsheetPath = "ToSplit2.xlsx"
xls = pd.ExcelFile(spreadsheetPath)

# Iterate through worksheets in opened Excel file
for sheet in xls.sheet_names:
    # Create a Pandas dataframe from the Excel worksheet (with no headers)
    excel_data_df = pd.read_excel(
        spreadsheetPath, sheet_name=sheet, header=None)

    # Return a list of dataframe index values where entire row is blank
    indexList = excel_data_df[excel_data_df.isnull().all(1)].index.tolist()

    # Prints [11, 23]
    print(indexList)

    # Initiate a dictionary
    dataframeDictionary = {}

    # For every index value in the list
    for index in indexList:
        # Split and add the result to the dictionary of Panda's dataframes
        dataframeDictionary = np.array_split(excel_data_df, index)

    # For every pandas dataframe in the dataframe dictionary
    for dataframe in dataframeDictionary:
        # Write the pandas dataframe to Excel with a worksheet name equal to dataframe address 0,0
        dataframe.to_excel("output.xlsx",sheet_name=str(dataframe.iloc[0][0]))

I am trying to split the Excel worksheet into multiple spreadsheets based on the blank rows.我正在尝试根据空白行将 Excel 工作表拆分为多个电子表格。 Eg:例如:

Section One: (there would also be Section Two and Section Three worksheets)第一节:(还有第二节和第三节工作表)

+-----------------+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Section One     |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 1   | 100 |     |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 2   | 100 | 200 |     |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 3   | 100 | 200 | 300 |     |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 4   | 100 | 200 | 300 | 400 |     |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 5   | 100 | 200 | 300 | 400 | 500 |     |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 6   | 100 | 200 | 300 | 400 | 500 | 600 |     |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 7   | 100 | 200 | 300 | 400 | 500 | 600 | 700 |     |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 8   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 |     |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 9   | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 |      |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
| Label 10  | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
+-----------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+

I believe I am really close, but seem to be slipping up on the data frame splitting.我相信我真的很接近,但似乎在数据帧拆分方面有所失误。

Make changes according to your file name.根据您的文件名进行更改。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read excel file
df = pd.read_excel('ToSplit2.xlsx', skip_blank_lines=False, header=None)

# Split by blank rows
df_list = np.split(df, df[df.isnull().all(1)].index)

# Create new excel to write the dataframes
writer = pd.ExcelWriter('Excel_one.xlsx', engine='xlsxwriter')
for i in range(1, len(df_list) + 1):
    df_list[i - 1] = df_list[i - 1].dropna(how='all')
    df_list[i - 1].to_excel(writer, sheet_name='Sheet{}'.format(i), header=None, index=False)
    
# Save the excel file
writer.save()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM