如何从 Python 中的 excel 工作表的每个选项卡中读取多个表格？

Question

So I've an excel sheet that has multiple tabs and each individual tab has multiple tables in it.所以我有一个 excel 表，它有多个选项卡，每个单独的选项卡中都有多个表。 So i want to read the file in such a way that it reads each table from each tab of the sheet, for instance,所以我想以这样一种方式读取文件，即它从工作表的每个选项卡读取每个表，例如，

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

I want to read each one of these table in pandas dataframe and then save it to sql database.我想读取 pandas dataframe 中的每一个表，然后将其保存到 sql 数据库中。 I know how how to read multiple tabs from the excel sheet.我知道如何从 excel 表中读取多个选项卡。

Can anyone help me out here or point me to a direction where i can find a lead?任何人都可以在这里帮助我或指出我可以找到线索的方向吗？

The tables in the tab are pre-defined and have name.选项卡中的表是预定义的并具有名称。 Thats how it looks like in each tab Tab from excel sheet这就是excel 工作表中每个选项卡中的样子

Answer 1

You probably have to tweak it to match your data;您可能需要调整它以匹配您的数据； imagine if you have some tables below and some above.想象一下，如果您在下方和上方有一些表格。 This, hopefully, should point you in the right direction.希望这会为您指明正确的方向。 Also, note the number of for loops I used;另外，请注意我使用的 for 循环的数量； I believe you can do better and optimize it further.我相信你可以做得更好，进一步优化。

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val, cell.row)].append(cell.column)
        else:
            if cell.value not in (None, ""): # so the 0s are not lost
                rest_data.append((cell.row, cell.column, cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max, 
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name, row, range(min(value), max(value) + 1))
    for (name, row), value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined, then filtered for the respective 
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
    cell_row,
    cell_column,
    cell_value,
) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header, cell_row, cell_column, cell_value))

final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))

# another iteration to get the final result
for key, value in content:
    final[key[-1]].append([val[-1] for val in value])

You can create your dataframe for each of the headers:您可以为每个标头创建 dataframe：

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0

如何从 Python 中的 excel 工作表的每个选项卡中读取多个表格？

问题描述

1 个解决方案

解决方案1
1 2020-12-22 03:24:42

如何从 Python 中的 excel 工作表的每个选项卡中读取多个表格？

问题描述

1 个解决方案

解决方案1 1 2020-12-22 03:24:42

解决方案1
1 2020-12-22 03:24:42