简体   繁体   English

遍历excel的不同标签,提取数据并放入dataframe

[英]Iterate through different tabs of excel, extract data and put into a dataframe

I have a single excel workbook, df, that contains two tabs, Sheet1 and Sheet2 .我有一个 excel 工作簿 df,其中包含两个选项卡Sheet1Sheet2 I would like to extract values from both tabs and create a new dataframe using openpyxl/Pandas.我想从两个选项卡中提取值并使用 openpyxl/Pandas 创建一个新的 dataframe。

Sheet1表 1

      2021    2021
      q1      q2
ID    1       1
ID2   3       3
name  A       A

Sheet2表 2

  2021    2021
      q1      q2
ID    2       2
ID2   2       2
name  B       B

在此处输入图像描述

Desired期望的

quarter year ID   ID2   name

q1     2021  1    3     A
q1     2021  2    2     B

Doing正在做

#Load openpyxl

import openpyxl

wb = openpyxl.load_workbook("df.xlsx")
ws1 = wb.worksheets[0]
ws2 = wb.worksheets[1]


#create loop that will iterate over the first row and end at 2nd column for each sheet

for row in ws1.iter_rows(min_row = 0, max_col = 1, max_row = 3, min_col = 0 
                            for cell in row:
                                 print(cell.value, end="")
                            print()


for row in ws2.iter_rows(min_row = 0, max_col = 1,  max_row = 3, min_col = 0 
                            for cell in row:
                                 print(cell.value, end="")
                            print()

I am having trouble with creating a new dataframe from the values collected.我无法从收集的值中创建新的 dataframe。 Any suggestion or input is appreciated.任何建议或意见表示赞赏。 I am still troubleshooting this.我仍在对此进行故障排除。

pd.read_excel can read a specific sheet or multiple, like shown below: pd.read_excel 可以读取一个或多个特定的工作表,如下所示:

import pandas as pd
dict_dfs = pd.read_excel("df.xlsx", sheet_name=[0,1])

df = pd.concat(dict_dfs)

Afterwards you can iterate over the dictionary of dataframes or combine them directly if the format of the excel files already allows it.之后,如果 excel 文件的格式已经允许,您可以遍历数据帧字典或直接组合它们。

Example, after loading with sample data:示例,加载样本数据后:

import pandas as pd
df1 = pd.DataFrame({'2021': {'_': 'q1', 'ID': '2', 'ID2': '2', 'name': 'B'},
 '2021.1': {'_': 'q2', 'ID': '2', 'ID2': '2', 'name': 'B'}})
df2 = pd.DataFrame({'2021': {'_': 'q1', 'ID': '1', 'ID2': '3', 'name': 'A'},
 '2021.1': {'_': 'q2', 'ID': '1', 'ID2': '3', 'name': 'A'}})


df = pd.concat([df1.T,df2.T])
df.index = df.index.str.split(".").str[0]
print(df)
#        _ ID ID2 name
# 2021  q1  2   2    B
# 2021  q2  2   2    B
# 2021  q1  1   3    A
# 2021  q2  1   3    A

The .T gives you the transposed dataframe. .T为您提供转置的 dataframe。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM