简体   繁体   English

在列上合并多个数据帧

[英]Merging multiple dataframes on column

I am trying to merge/join multiple Dataframe s and so far I have no luck. 我正在尝试合并/加入多个Dataframe ,到目前为止我没有运气。 I've found merge method, but it works only with two Dataframes. 我找到了merge方法,但它只适用于两个Dataframe。 I also found this SO answer suggesting to do something like that: 我也发现这个SO 答案建议做这样的事情:

df1.merge(df2,on='name').merge(df3,on='name')

Unfortunatelly it will not work in my case, because I have 20+ number of dataframes. 不幸的是,它不适用于我的情况,因为我有20多个数据帧。

My next idea was to use join . 我的下一个想法是使用join According to the reference when joining multiple dataframes I need to use list and only I can join on index column. 根据连接多个数据帧时的参考,我需要使用列表,只有我可以加入索引列。 So I changed indexes for all of the columns (ok, it can be done grammatically easily) and end up with something like this: 所以我更改了所有列的索引(好吧,它可以通过语法轻松完成)并最终得到如下内容:

df.join([df1,df2,df3])

Unfortunately, also this approach failed, because other columns names are this same in all dataframes. 不幸的是,这种方法也失败了,因为其他列名在所有数据帧中都是相同的。 I've decided to do the last thing, that is renaming all columns. 我决定做最后一件事,那就是重命名所有列。 But when I finally joined everything: df = pd.Dataframe() df.join([df1,df2,df3]) 但是当我最终加入所有内容时:df = pd.Dataframe()df.join([df1,df2,df3])

I've received empty dataframe. 我收到了空数据框。 I have no more idea, how I can join them. 我不知道,我怎么能加入他们。 Can someone suggest anything more? 有人可以提出更多建议吗?

EDIT1: EDIT1:

Sample input: 样本输入:

import pandas as pd

df1 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr1', 'attr2'])

df1 
  name attr1 attr2
0    a     5    19
1    b    14    16
2    c     4     9

df2
  name attr1 attr2
0    a    15    49
1    b     4    36
2    c    14     9

Expected output: 预期产量:

df
  name attr1_1 attr2_1 attr1_2 attr2_2
0    a     5    19      15      49
1    b    14    16      4       36
2    c     4     9      14      9

Indexes might be unordered between dataframes, but it is guaranteed, that they will exists. 索引可能在数据帧之间无序,但保证它们将存在。

use pd.concat 使用pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')

merged

在此输入图像描述

Or 要么

merged.reset_index()

在此输入图像描述

use reduce: 使用减少:

def my_merge(df1, df2):
    return df1.merge(df2,on='name')

final_df = reduce(my_merge, df_list)

considering df_list to be a list of your dataframes 将df_list视为数据帧列表

The solution of @piRSquared works for 20+ dataframes, see the following script for creating 20+ example dataframes: @piRSquared的解决方案适用于20多个数据帧,请参阅以下脚本以创建20多个示例数据帧:

N = 25
dflist = []

for d in range(N):
    df = pd.DataFrame(np.random.rand(3,2))
    df.columns = ['attr1', 'attr2']

    df['name'] = ['a', 'b', 'c']

    dflist.append(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM