简体   繁体   English

将单个工作簿中的Excel工作表将列值合并到Pandas数据框中

[英]Merging excel sheets in a single workbook on column value into a pandas dataframe

I need to take multiple worksheets in an excel workbook and merge them into a single dataframe based on a set of column values in each sheet. 我需要在一个Excel工作簿中使用多个工作表,然后根据每个工作表中的一组列值将它们合并到一个数据框中。

I have: 我有:

Sheet 1:
ID  A  B  C
1   0  l  g
2   2  e  n
3   3  c  h

Sheet 2: 工作表2:

ID L  M  N
1  7  u  i
2  0  o  j
3  9  c  k

I'm looking for: Sheet 3: 我在寻找:工作表3:

ID A B C L M N
1
2
3

EDIT I'm dealing with an arbitrary number of sheets, which is what makes it complicated. 编辑我正在处理任意数量的工作表,这使它变得复杂。

I'm new to pandas/python/coding, but am working with this right now: 我是pandas / python / coding的新手,但是现在正在使用它:

import pandas as pd
import numpy as np

def get_sheets():
    """ Get sheets to join"""
    ask = input("Are the sheets in the same workbook?".lower())
    if ask == "yes" or "y":
        file = input("Please enter the filepath for the workbook")
        df_lib = pd.read_excel(file, None)
        merged = pd.merge(df_lib.items(), how="left" on='ID')
        merged.to_csv("new_merged_data.csv")

I'm returning an error because I don't have a "right" dataframe to join on. 我返回错误,因为我没有要加入的“正确”数据框。 But I'm not sure how to either break apart the library of dataframes created by the pd.read_excel function or to call them within pd.merge function. 但是我不确定如何分解由pd.read_excel函数创建的数据帧库,或者如何在pd.merge函数中调用它们。

You can get both worksheets in 2 different dataframes and merge them. 您可以在2个不同的数据框中获得两个工作表,然后将它们合并。

import pandas as pd
import numpy as np

def get_sheets():
""" Get sheets to join"""
 ask = input("Are the sheets in the same workbook?".lower())
 if ask == "yes" or "y":
    file = input("Please enter the filepath for the workbook")

    df1 = pd.read_excel(file, sheet_name='Sheet1')
    df2 = pd.read_excel(file, sheet_name='Sheet2')
    results= df1.merge(df2, on='ID', how="left")
    results.to_csv("new_merged_data.csv")

Along with this, I noticed that in your code, you are missing , between how="left" on='ID' 与此同时,我注意到,在你的代码,你缺少的,之间how="left" on='ID'

If you have an arbitrary number of sheets that you want to merge you can load all sheets with the following command: 如果要合并的图纸数量任意,则可以使用以下命令加载所有图纸:

# for pandas version < 0.21.0
sheets = pd.read_excel(file_name, sheetname=None)

# for pandas version >= 0.21.0
sheets = pd.read_excel(file_name, sheet_name=None)

This will give you an ordered dict with sheet name as key and corresponding data frame as value. 这将为您提供顺序表dict,其中工作表名称为键,相应的数据框为值。

Then you will need list of data frames from sheets. 然后,您将需要工作表中的数据帧列表。 You can obtain that using 您可以使用获取

dfs = list(sheets.values())

Once you have this you can use the code below to merge all sheets into one data frame. 完成此操作后,您可以使用下面的代码将所有工作表合并到一个数据框中。

from functools import reduce
merged = reduce(lambda left, right: pd.merge(left, right, on='ID', how='left'), dfs)
results.to_csv("new_merged_data.csv")

Please try it :) 请尝试:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM