[英]How to read multiple tables from each tab of an excel sheet in Python?
所以我有一個 excel 表,它有多個選項卡,每個單獨的選項卡中都有多個表。 所以我想以這樣一種方式讀取文件,即它從工作表的每個選項卡讀取每個表,例如,
Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....
我想讀取 pandas dataframe 中的每一個表,然后將其保存到 sql 數據庫中。 我知道如何從 excel 表中讀取多個選項卡。
任何人都可以在這里幫助我或指出我可以找到線索的方向嗎?
選項卡中的表是預定義的並具有名稱。 這就是excel 工作表中每個選項卡中的樣子
您可能需要調整它以匹配您的數據; 想象一下,如果您在下方和上方有一些表格。 希望這會為您指明正確的方向。 另外,請注意我使用的 for 循環的數量; 我相信你可以做得更好,進一步優化。
from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter
wb = load_workbook(filename="test.xlsx")
sheet = wb["Sheet1"]
green_rows = defaultdict(list)
rest_data = []
for row in sheet:
for cell in row:
look for the green rows; they contain the headers
if cell.fill.fgColor.rgb == "FFA2D722":
# take advantage of the fact that header
# is the first entry in that row
if cell.value:
val = cell.value
green_rows[(val, cell.row)].append(cell.column)
else:
if cell.value not in (None, ""): # so the 0s are not lost
rest_data.append((cell.row, cell.column, cell.value))
# get the max and minimum column positions
# note the addition of 1 to the max,
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
(name, row, range(min(value), max(value) + 1))
for (name, row), value in green_rows.items()
]
box = []
# here the green rows and the rest of the data
# are combined, then filtered for the respective
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
cell_row,
cell_column,
cell_value,
) in combo:
# this is where the filtration occurs
if (header_row < cell_row) and (cell_column in header_column_range):
box.append((header, cell_row, cell_column, cell_value))
final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))
# another iteration to get the final result
for key, value in content:
final[key[-1]].append([val[-1] for val in value])
您可以為每個標頭創建 dataframe:
pd.DataFrame(final["Address Association"])
0 1 2 3 4 5
0 Column Name in DB Name Description SortOrder BusinessMeaningName Obsolete
1 Field Type nvarchar(100) nvarchar(255) int nvarchar(50) bit
2 Mandatory Yes Yes Yes No Yes
3 Foreign Key - - - - -
4 Optional Feature - - - - -
5 Field Name in U4SM Name Description Sort Order Business Meaning Name Obsolete
6 Address.Primary Primary Use this address by default. 1 Address.Primary 0
7 Address.Billing Billing address for billing. 2 Address.Billing 0
8 Address.Emergency Emergency use this for emergency. 3 Address.Emergency 0
9 Address.Emergency SMS Emergency SMS use this for emergency SMS. 4 Address.Emergency SMS 0
10 Address.Deceased Deceased address for deceased. 5 Address.Deceased 0
11 Address.Home Home address for home. 8 Address.Home 0
12 Address.Mailing Mailing address for mailing. 9 Address.Mailing 0
13 Address.Mobile Mobile use this for mobile. 10 Address.Mobile 0
14 Address.School School address for school. 13 Address.School 0
15 Address.SMS SMS use this for SMS text. 15 Address.SMS 0
16 Address.Work Work address for work 16 Address.Work 0
17 Address.Permanent Permanent Permanent Address 17 Address.Permanent 0
18 Address.HallsOfResidence Halls of Residence Halls of Residence 18 Address.HallsOfResidence 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.