簡體   English   中英

如何從 Python 中的 excel 工作表的每個選項卡中讀取多個表格?

[英]How to read multiple tables from each tab of an excel sheet in Python?

所以我有一個 excel 表,它有多個選項卡,每個單獨的選項卡中都有多個表。 所以我想以這樣一種方式讀取文件,即它從工作表的每個選項卡讀取每個表,例如,

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

我想讀取 pandas dataframe 中的每一個表,然后將其保存到 sql 數據庫中。 我知道如何從 excel 表中讀取多個選項卡。

任何人都可以在這里幫助我或指出我可以找到線索的方向嗎?

選項卡中的表是預定義的並具有名稱。 這就是excel 工作表中每個選項卡中的樣子

您可能需要調整它以匹配您的數據; 想象一下,如果您在下方和上方有一些表格。 希望這會為您指明正確的方向。 另外,請注意我使用的 for 循環的數量; 我相信你可以做得更好,進一步優化。

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val, cell.row)].append(cell.column)
        else:
            if cell.value not in (None, ""): # so the 0s are not lost
                rest_data.append((cell.row, cell.column, cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max, 
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name, row, range(min(value), max(value) + 1))
    for (name, row), value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined, then filtered for the respective 
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
    cell_row,
    cell_column,
    cell_value,
) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header, cell_row, cell_column, cell_value))

final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))

# another iteration to get the final result
for key, value in content:
    final[key[-1]].append([val[-1] for val in value])

您可以為每個標頭創建 dataframe:

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM