如何使用公式作为参考，使用 Python 和 Pandas（或类似的东西）从另一个 Excel 选项卡中获取数据

Question

First, I'm not that sure if pandas is the right approach to this, it may be better done with VBA or another lib like openpyxl.首先，我不确定 pandas 是否是解决此问题的正确方法，使用 VBA 或其他类似 openpyxl 的库可能会更好。

I have a excel sheet which has two different tabs (tab1 has a name and a value, which is a formula like: ='tab2'!H10 , for instance, tab2 has said value (or sum of values) and other bunch of information).我有一个 excel 表，它有两个不同的选项卡（tab1 有一个名称和一个值，这是一个公式，例如： ='tab2'!H10 ，例如，tab2 有表示值（或值的总和）和其他一堆信息）。

I want to get information from the value column on tab1, which may have reference for more than one cell on the second tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1 .我想从 tab1 的 value 列中获取信息，这可能对第二个 tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1上的多个单元格有参考。 Extract those ROWS (row 10, 12 and 20 on this example) and fetch information from 3 columns on tab2, for those rows.提取这些 ROWS（本例中的第 10、12 和 20 行）并从 tab2 上的 3 列中获取这些行的信息。

Then, I want to "join" (not sure if a join) the name on tab1 with those 3 columns from tab2 on said lines.然后，我想“加入”（不确定是否加入）tab1 上的名称与来自 tab2 的那 3 列在所述行上。 Something like this as the end result:最终结果是这样的：

Code that I'm trying and it's not currently working, error ValueError: cannot join with no overlapping index names我正在尝试但当前无法正常工作的代码，错误ValueError: cannot join with no overlap index names

import numpy as np
import pandas as pd
from IPython.display import display
from openpyxl import Workbook
from openpyxl import load_workbook

wbx = load_workbook(filename= 'test.xlsx')

sheet_names = wbx.sheetnames

name1 = sheet_names[0]
sheet_ranges1 = wbx[name1]

df1 = pd.DataFrame(sheet_ranges1.values)

name2 = sheet_names[1]
sheet_ranges2 = wbx[name2]

df2 = pd.DataFrame(sheet_ranges2.values)

pd.set_option("display.max_rows", None, "display.max_columns", None)

c1 = df1.iloc[:,[1]]
c2 = df1.iloc[:,24]
print(c1.dtypes)

res = c2.str.extractall(r"!H(?P<line>\d+)?")
res2 = c1.merge(pd.DataFrame(res), how='left', left_index=True, right_index=True)

Answer 1

hope it helps:希望能帮助到你：

import pandas as pd
df1 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet1')
df2 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet2')
df3 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet3')

# drop columns as needed that are not to include in merged result, or to avoid duplicate column that will be col_x and col_y
df1 = df1.drop(columns=['col2', 'col3'], index=False)

# join table
dfx = df1.merge(df2, how="inner", left_on="col1", right_on="col2)
merged = dfx.merge(df3, how="left", left_on="col7", right_on="col3)
print(merged.head())

you can do as well in VBA你也可以在 VBA

Sub JoinTables()

 Dim connection As ADODB.Connection
 Set connection = New ADODB.Connection

 With connection
     .Provider = "Microsoft.Jet.OLEDB.4.0"
     .ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & "Extended Properties=Excel 8.0;"
     .Open
 End With

 Dim recordset As ADODB.Recordset
 Set recordset = New ADODB.Recordset

 recordset.Open "SELECT * FROM [Sheet1$] INNER JOIN [Sheet2$] ON [Sheet1$].[type] = " & "[Sheet2$].[type]", connection

 With Worksheets("Sheet3")
     .Cells(2, 1).CopyFromRecordset recordset
 End With

 recordset.Close
 connection.Close

 End Sub

如何使用公式作为参考，使用 Python 和 Pandas（或类似的东西）从另一个 Excel 选项卡中获取数据

问题描述

1 个解决方案

解决方案1
0 2022-01-28 05:08:07

如何使用公式作为参考，使用 Python 和 Pandas（或类似的东西）从另一个 Excel 选项卡中获取数据

问题描述

1 个解决方案

解决方案1 0 2022-01-28 05:08:07

解决方案1
0 2022-01-28 05:08:07