[英]How can I fetch data from another Excel tab using a formula as reference using Python and Pandas (or something like)
First, I'm not that sure if pandas is the right approach to this, it may be better done with VBA or another lib like openpyxl.首先,我不确定 pandas 是否是解决此问题的正确方法,使用 VBA 或其他类似 openpyxl 的库可能会更好。
I have a excel sheet which has two different tabs (tab1 has a name and a value, which is a formula like: ='tab2'!H10 , for instance, tab2 has said value (or sum of values) and other bunch of information).我有一个 excel 表,它有两个不同的选项卡(tab1 有一个名称和一个值,这是一个公式,例如: ='tab2'!H10 ,例如,tab2 有表示值(或值的总和)和其他一堆信息)。
I want to get information from the value column on tab1, which may have reference for more than one cell on the second tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1 .我想从 tab1 的 value 列中获取信息,这可能对第二个 tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1上的多个单元格有参考。 Extract those ROWS (row 10, 12 and 20 on this example) and fetch information from 3 columns on tab2, for those rows.提取这些 ROWS(本例中的第 10、12 和 20 行)并从 tab2 上的 3 列中获取这些行的信息。
Then, I want to "join" (not sure if a join) the name on tab1 with those 3 columns from tab2 on said lines.然后,我想“加入”(不确定是否加入)tab1 上的名称与来自 tab2 的那 3 列在所述行上。 Something like this as the end result:最终结果是这样的:
| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 10从第 10 行开始
| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 12从第 12 行开始
| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 20从第 20 行开始
Code that I'm trying and it's not currently working, error ValueError: cannot join with no overlapping index names我正在尝试但当前无法正常工作的代码,错误ValueError: cannot join with no overlap index names
import numpy as np
import pandas as pd
from IPython.display import display
from openpyxl import Workbook
from openpyxl import load_workbook
wbx = load_workbook(filename= 'test.xlsx')
sheet_names = wbx.sheetnames
name1 = sheet_names[0]
sheet_ranges1 = wbx[name1]
df1 = pd.DataFrame(sheet_ranges1.values)
name2 = sheet_names[1]
sheet_ranges2 = wbx[name2]
df2 = pd.DataFrame(sheet_ranges2.values)
pd.set_option("display.max_rows", None, "display.max_columns", None)
c1 = df1.iloc[:,[1]]
c2 = df1.iloc[:,24]
print(c1.dtypes)
res = c2.str.extractall(r"!H(?P<line>\d+)?")
res2 = c1.merge(pd.DataFrame(res), how='left', left_index=True, right_index=True)
hope it helps:希望能帮助到你:
import pandas as pd
df1 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet1')
df2 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet2')
df3 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet3')
# drop columns as needed that are not to include in merged result, or to avoid duplicate column that will be col_x and col_y
df1 = df1.drop(columns=['col2', 'col3'], index=False)
# join table
dfx = df1.merge(df2, how="inner", left_on="col1", right_on="col2)
merged = dfx.merge(df3, how="left", left_on="col7", right_on="col3)
print(merged.head())
you can do as well in VBA你也可以在 VBA
Sub JoinTables()
Dim connection As ADODB.Connection
Set connection = New ADODB.Connection
With connection
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & "Extended Properties=Excel 8.0;"
.Open
End With
Dim recordset As ADODB.Recordset
Set recordset = New ADODB.Recordset
recordset.Open "SELECT * FROM [Sheet1$] INNER JOIN [Sheet2$] ON [Sheet1$].[type] = " & "[Sheet2$].[type]", connection
With Worksheets("Sheet3")
.Cells(2, 1).CopyFromRecordset recordset
End With
recordset.Close
connection.Close
End Sub
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.