简体   繁体   English

连接来自两个数据帧的两个相同列,pandas

[英]Join two same columns from two dataframes, pandas

I am looking for fastest way to join columns with same names using separator.我正在寻找使用分隔符连接具有相同名称的列的最快方法。 my dataframes:我的数据框:

df1:
A,B,C,D
my,he,she,it

df2:
A,B,C,D
dog,cat,elephant,fish

expected output:预期 output:

df:
A,B,C,D
my:dog,he:cat,she:elephant,it:fish

As you can see, I want to merge columns with same names, two cells in one.如您所见,我想合并具有相同名称的列,将两个单元格合二为一。 I can use this code for A column:我可以将此代码用于A列:

df=df1.merge(df2)
df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1)

In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?在我的真实数据集中,我有超过 30 列,我不想为每一列写相同的行,有没有更快的方法来接收我预期的 output?

How about concat and groupby ? concatgroupby怎么样?

df3 = pd.concat([df1,df2],axis=0)
df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates()
print(df3)
         A       B             C        D
0  my:dog  he:cat  she:elephant  it:fish

How about this?这个怎么样?

df3 = df1 + ':' + df2
print(df3)
       A       B         C             D 
0   my:dog  he:cat  she:elephant    it:fish

This is good because if there's columns that doesn't match, you get NaN , so you can filter then later if you want:这很好,因为如果有不匹配的列,您会得到NaN ,因此您可以稍后根据需要进行过滤:

df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']})
df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']})
df1 + ':' + df2
       A       B          C             D    E   F
0   my:dog  he:cat  she:elephant    it:fish NaN NaN

you can do this by simply adding the two dataframe with a separator.您只需将两个 dataframe 与分隔符相加即可。

import pandas as pd

df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])

df1["A"] = "my"
df1["B"] = "he"
df1["C"] = "she"
df1["D"] = "it"
df2["A"] = "dog"
df2["B"] = "cat"
df2["C"] = "elephant"
df2["D"] = "fish"

print(df1)
print(df2)

df3 = df1 + ':' + df2
print(df3)

This will give you a result like:这将为您提供如下结果:

A   B    C   D
0  my  he  she  it
     A    B         C     D
0  dog  cat  elephant  fish
        A       B             C        D
0  my:dog  he:cat  she:elephant  it:fish

Is this what you try to achieve?这是你试图达到的目标吗? Although, this only works if you have same columns in both the dataframes.虽然,这只有在两个数据框中都有相同的列时才有效。 The extra columns will have nans.额外的列将有 nans。 What do you want to do with the columns those are not same in df1 and df2?您想对 df1 和 df2 中不同的列做什么? Please comment below to help me understand your problem better.请在下面发表评论,以帮助我更好地理解您的问题。

You can simply do:你可以简单地做:

df = df1 + ':' + df2
print(df)

Which is simple and effective哪个简单有效

This should be your answer这应该是你的答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM