[英]How to read multiple tables from each tab of an excel sheet in Python?
So I've an excel sheet that has multiple tabs and each individual tab has multiple tables in it.所以我有一个 excel 表,它有多个选项卡,每个单独的选项卡中都有多个表。 So i want to read the file in such a way that it reads each table from each tab of the sheet, for instance,
所以我想以这样一种方式读取文件,即它从工作表的每个选项卡读取每个表,例如,
Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....
I want to read each one of these table in pandas dataframe and then save it to sql database.我想读取 pandas dataframe 中的每一个表,然后将其保存到 sql 数据库中。 I know how how to read multiple tabs from the excel sheet.
我知道如何从 excel 表中读取多个选项卡。
Can anyone help me out here or point me to a direction where i can find a lead?任何人都可以在这里帮助我或指出我可以找到线索的方向吗?
The tables in the tab are pre-defined and have name.选项卡中的表是预定义的并具有名称。 Thats how it looks like in each tab Tab from excel sheet
这就是excel 工作表中每个选项卡中的样子
You probably have to tweak it to match your data;您可能需要调整它以匹配您的数据; imagine if you have some tables below and some above.
想象一下,如果您在下方和上方有一些表格。 This, hopefully, should point you in the right direction.
希望这会为您指明正确的方向。 Also, note the number of for loops I used;
另外,请注意我使用的 for 循环的数量; I believe you can do better and optimize it further.
我相信你可以做得更好,进一步优化。
from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter
wb = load_workbook(filename="test.xlsx")
sheet = wb["Sheet1"]
green_rows = defaultdict(list)
rest_data = []
for row in sheet:
for cell in row:
look for the green rows; they contain the headers
if cell.fill.fgColor.rgb == "FFA2D722":
# take advantage of the fact that header
# is the first entry in that row
if cell.value:
val = cell.value
green_rows[(val, cell.row)].append(cell.column)
else:
if cell.value not in (None, ""): # so the 0s are not lost
rest_data.append((cell.row, cell.column, cell.value))
# get the max and minimum column positions
# note the addition of 1 to the max,
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
(name, row, range(min(value), max(value) + 1))
for (name, row), value in green_rows.items()
]
box = []
# here the green rows and the rest of the data
# are combined, then filtered for the respective
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
cell_row,
cell_column,
cell_value,
) in combo:
# this is where the filtration occurs
if (header_row < cell_row) and (cell_column in header_column_range):
box.append((header, cell_row, cell_column, cell_value))
final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))
# another iteration to get the final result
for key, value in content:
final[key[-1]].append([val[-1] for val in value])
You can create your dataframe for each of the headers:您可以为每个标头创建 dataframe:
pd.DataFrame(final["Address Association"])
0 1 2 3 4 5
0 Column Name in DB Name Description SortOrder BusinessMeaningName Obsolete
1 Field Type nvarchar(100) nvarchar(255) int nvarchar(50) bit
2 Mandatory Yes Yes Yes No Yes
3 Foreign Key - - - - -
4 Optional Feature - - - - -
5 Field Name in U4SM Name Description Sort Order Business Meaning Name Obsolete
6 Address.Primary Primary Use this address by default. 1 Address.Primary 0
7 Address.Billing Billing address for billing. 2 Address.Billing 0
8 Address.Emergency Emergency use this for emergency. 3 Address.Emergency 0
9 Address.Emergency SMS Emergency SMS use this for emergency SMS. 4 Address.Emergency SMS 0
10 Address.Deceased Deceased address for deceased. 5 Address.Deceased 0
11 Address.Home Home address for home. 8 Address.Home 0
12 Address.Mailing Mailing address for mailing. 9 Address.Mailing 0
13 Address.Mobile Mobile use this for mobile. 10 Address.Mobile 0
14 Address.School School address for school. 13 Address.School 0
15 Address.SMS SMS use this for SMS text. 15 Address.SMS 0
16 Address.Work Work address for work 16 Address.Work 0
17 Address.Permanent Permanent Permanent Address 17 Address.Permanent 0
18 Address.HallsOfResidence Halls of Residence Halls of Residence 18 Address.HallsOfResidence 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.