简体   繁体   English

如何使用公式作为参考,使用 Python 和 Pandas(或类似的东西)从另一个 Excel 选项卡中获取数据

[英]How can I fetch data from another Excel tab using a formula as reference using Python and Pandas (or something like)

First, I'm not that sure if pandas is the right approach to this, it may be better done with VBA or another lib like openpyxl.首先,我不确定 pandas 是否是解决此问题的正确方法,使用 VBA 或其他类似 openpyxl 的库可能会更好。

I have a excel sheet which has two different tabs (tab1 has a name and a value, which is a formula like: ='tab2'!H10 , for instance, tab2 has said value (or sum of values) and other bunch of information).我有一个 excel 表,它有两个不同的选项卡(tab1 有一个名称和一个值,这是一个公式,例如: ='tab2'!H10 ,例如,tab2 有表示值(或值的总和)和其他一堆信息)。

I want to get information from the value column on tab1, which may have reference for more than one cell on the second tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1 .我想从 tab1 的 value 列中获取信息,这可能对第二个 tab ='tab2'!H10 + 'tab2'!H12 + 'tab2'!H20 on row = Name1上的多个单元格有参考。 Extract those ROWS (row 10, 12 and 20 on this example) and fetch information from 3 columns on tab2, for those rows.提取这些 ROWS(本例中的第 10、12 和 20 行)并从 tab2 上的 3 列中获取这些行的信息。

Then, I want to "join" (not sure if a join) the name on tab1 with those 3 columns from tab2 on said lines.然后,我想“加入”(不确定是否加入)tab1 上的名称与来自 tab2 的那 3 列在所述行上。 Something like this as the end result:最终结果是这样的:

| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 10从第 10 行开始

| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 12从第 12 行开始

| | Name 1 (from tab 1 - line) |名称 1(来自选项卡 1 - 行)| Column 1 (from tab2) |第 1 列(来自 tab2)| Column 2 |第 2 栏 | Column 3 |第 3 栏 | from row 20从第 20 行开始

Code that I'm trying and it's not currently working, error ValueError: cannot join with no overlapping index names我正在尝试但当前无法正常工作的代码,错误ValueError: cannot join with no overlap index names

import numpy as np
import pandas as pd
from IPython.display import display
from openpyxl import Workbook
from openpyxl import load_workbook

wbx = load_workbook(filename= 'test.xlsx')

sheet_names = wbx.sheetnames

name1 = sheet_names[0]
sheet_ranges1 = wbx[name1]

df1 = pd.DataFrame(sheet_ranges1.values)

name2 = sheet_names[1]
sheet_ranges2 = wbx[name2]

df2 = pd.DataFrame(sheet_ranges2.values)

pd.set_option("display.max_rows", None, "display.max_columns", None)

c1 = df1.iloc[:,[1]]
c2 = df1.iloc[:,24]
print(c1.dtypes)

res = c2.str.extractall(r"!H(?P<line>\d+)?")
res2 = c1.merge(pd.DataFrame(res), how='left', left_index=True, right_index=True)

hope it helps:希望能帮助到你:

import pandas as pd
df1 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet1')
df2 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet2')
df3 = pd.read_excel(r'.\foldername\filename.xlsx', sheet_name='sheet3')

# drop columns as needed that are not to include in merged result, or to avoid duplicate column that will be col_x and col_y
df1 = df1.drop(columns=['col2', 'col3'], index=False)

# join table
dfx = df1.merge(df2, how="inner", left_on="col1", right_on="col2)
merged = dfx.merge(df3, how="left", left_on="col7", right_on="col3)
print(merged.head())

you can do as well in VBA你也可以在 VBA

Sub JoinTables()

 Dim connection As ADODB.Connection
 Set connection = New ADODB.Connection

 With connection
     .Provider = "Microsoft.Jet.OLEDB.4.0"
     .ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & "Extended Properties=Excel 8.0;"
     .Open
 End With

 Dim recordset As ADODB.Recordset
 Set recordset = New ADODB.Recordset

 recordset.Open "SELECT * FROM [Sheet1$] INNER JOIN [Sheet2$] ON [Sheet1$].[type] = " & "[Sheet2$].[type]", connection

 With Worksheets("Sheet3")
     .Cells(2, 1).CopyFromRecordset recordset
 End With

 recordset.Close
 connection.Close

 End Sub

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 和 pandas 从 excel 文件中提取数据? - How can I extract data from an excel file like this using python and pandas? "如何使用 Python 从网络选项卡的特定请求的“请求标头”中获取信息?" - How can I fetch info from the "Request Headers" of a specific request from the network tab using Python? 当我将它保存为 Excel 文件时,如何使用 Pandas 插入一个 Excel 公式? - How can I insert an excel formula using pandas when I save it as an excel file? 如何使用python将数据从excel复制并粘贴到另一个excel - How to copy and paste data from excel to another excel using python 如何使用 pandas dataframe 从 excel 公式中删除“@”符号? - How to remove "@" symbol from excel formula using pandas dataframe? 使用 python pandas 从 excel 过滤数据 - Filter data from excel using python pandas 如何使用Pandas将python Web抓取数据导出到现有excel文件中的特定工作表? - How can I export my python web scrape data to a specific sheet in an existing excel file using pandas? 如何使用 pandas python 从 excel 读取 MultiIndex 组数据 - How to read MultiIndex group data from excel using pandas python 如何使用 python 中的 pandas 从 excel 数据创建嵌套字典? - How to Create a nested dictionary, from excel data using pandas in python? Python,Pandas-如何在数据范围内打印某些内容? - Python, Pandas - How can I get something printed in a data range?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM