简体   繁体   English

如何从 Python 中的 excel 工作表的每个选项卡中读取多个表格?

[英]How to read multiple tables from each tab of an excel sheet in Python?

So I've an excel sheet that has multiple tabs and each individual tab has multiple tables in it.所以我有一个 excel 表,它有多个选项卡,每个单独的选项卡中都有多个表。 So i want to read the file in such a way that it reads each table from each tab of the sheet, for instance,所以我想以这样一种方式读取文件,即它从工作表的每个选项卡读取每个表,例如,

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

I want to read each one of these table in pandas dataframe and then save it to sql database.我想读取 pandas dataframe 中的每一个表,然后将其保存到 sql 数据库中。 I know how how to read multiple tabs from the excel sheet.我知道如何从 excel 表中读取多个选项卡。

Can anyone help me out here or point me to a direction where i can find a lead?任何人都可以在这里帮助我或指出我可以找到线索的方向吗?

The tables in the tab are pre-defined and have name.选项卡中的表是预定义的并具有名称。 Thats how it looks like in each tab Tab from excel sheet这就是excel 工作表中每个选项卡中的样子

You probably have to tweak it to match your data;您可能需要调整它以匹配您的数据; imagine if you have some tables below and some above.想象一下,如果您在下方和上方有一些表格。 This, hopefully, should point you in the right direction.希望这会为您指明正确的方向。 Also, note the number of for loops I used;另外,请注意我使用的 for 循环的数量; I believe you can do better and optimize it further.我相信你可以做得更好,进一步优化。

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val, cell.row)].append(cell.column)
        else:
            if cell.value not in (None, ""): # so the 0s are not lost
                rest_data.append((cell.row, cell.column, cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max, 
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name, row, range(min(value), max(value) + 1))
    for (name, row), value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined, then filtered for the respective 
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
    cell_row,
    cell_column,
    cell_value,
) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header, cell_row, cell_column, cell_value))

final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))

# another iteration to get the final result
for key, value in content:
    final[key[-1]].append([val[-1] for val in value])

You can create your dataframe for each of the headers:您可以为每个标头创建 dataframe:

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 从单个 excel 表中读取两个表? - How to read two tables from single excel sheet using python? 有没有办法从单个 xlsx 读取多个 excel 选项卡/工作表到多个数据帧,每个 dataframe 以工作表名称命名? - is there a way to read multiple excel tab/sheets from single xlsx to multiple dataframes with each dataframe named with sheet name? 如何使用 python 读取多个 excel 表 - How to read multiple excel sheet using python 如何从多个 csv 文件中读取数据并写入 Python 中的单个 Excel 表的同一张表 - How to read data from multiple csv files and write into same sheet of single Excel Sheet in Python 将每个 excel 工作表读取为 Python 中的不同 dataframe - Read each excel sheet as a different dataframe in Python 从多个 excel 表中的多个选项卡中跳过一个特定的 excel 选项卡(Pandas Python) - Skipping one specific excel tab from multiple tabs in multiple excel sheet (Pandas Python) 如何使用python从文件夹中的多个excel文件中读取具有“ mine”工作表名称的工作表? 我正在使用xlrd - how to read any sheet with the sheet name containing 'mine' from multiple excel files in a folder using python? i am using xlrd 如何在python中将相同的数据从Excel工作表读取到文本文件 - How to read the same data from excel sheet to the textfile in Python pandas read_excel在同一张纸上的多个表 - pandas read_excel multiple tables on the same sheet 从 Excel 工作簿的每张纸中读取一个单元格并编译成一个列表(熊猫/python) - Read one cell from each sheet of an Excel workbook and compile into a list (pandas/python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM