[英]Join two same columns from two dataframes, pandas
I am looking for fastest way to join columns with same names using separator.我正在寻找使用分隔符连接具有相同名称的列的最快方法。 my dataframes:
我的数据框:
df1:
A,B,C,D
my,he,she,it
df2:
A,B,C,D
dog,cat,elephant,fish
expected output:预期 output:
df:
A,B,C,D
my:dog,he:cat,she:elephant,it:fish
As you can see, I want to merge columns with same names, two cells in one.如您所见,我想合并具有相同名称的列,将两个单元格合二为一。 I can use this code for
A
column:我可以将此代码用于
A
列:
df=df1.merge(df2)
df['A'] = df[['A_x','A_y']].apply(lambda x: ':'.join(x), axis = 1)
In my real dataset i have above 30 columns, and i dont want to write same lines for each of them, is there any faster way to receive my expected output?在我的真实数据集中,我有超过 30 列,我不想为每一列写相同的行,有没有更快的方法来接收我预期的 output?
How about concat
and groupby
? concat
和groupby
怎么样?
df3 = pd.concat([df1,df2],axis=0)
df3 = df3.groupby(df3.index).transform(lambda x : ':'.join(x)).drop_duplicates()
print(df3)
A B C D
0 my:dog he:cat she:elephant it:fish
How about this?这个怎么样?
df3 = df1 + ':' + df2
print(df3)
A B C D
0 my:dog he:cat she:elephant it:fish
This is good because if there's columns that doesn't match, you get NaN
, so you can filter then later if you want:这很好,因为如果有不匹配的列,您会得到
NaN
,因此您可以稍后根据需要进行过滤:
df1 = pd.DataFrame({'A': ['my'], 'B': ['he'], 'C': ['she'], 'D': ['it'], 'E': ['another'], 'F': ['and another']})
df2 = pd.DataFrame({'A': ['dog'], 'B': ['cat'], 'C': ['elephant'], 'D': ['fish']})
df1 + ':' + df2
A B C D E F
0 my:dog he:cat she:elephant it:fish NaN NaN
you can do this by simply adding the two dataframe with a separator.您只需将两个 dataframe 与分隔符相加即可。
import pandas as pd
df1 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df2 = pd.DataFrame(columns=["A", "B", "C", "D"], index=[0])
df1["A"] = "my"
df1["B"] = "he"
df1["C"] = "she"
df1["D"] = "it"
df2["A"] = "dog"
df2["B"] = "cat"
df2["C"] = "elephant"
df2["D"] = "fish"
print(df1)
print(df2)
df3 = df1 + ':' + df2
print(df3)
This will give you a result like:这将为您提供如下结果:
A B C D
0 my he she it
A B C D
0 dog cat elephant fish
A B C D
0 my:dog he:cat she:elephant it:fish
Is this what you try to achieve?这是你试图达到的目标吗? Although, this only works if you have same columns in both the dataframes.
虽然,这只有在两个数据框中都有相同的列时才有效。 The extra columns will have nans.
额外的列将有 nans。 What do you want to do with the columns those are not same in df1 and df2?
您想对 df1 和 df2 中不同的列做什么? Please comment below to help me understand your problem better.
请在下面发表评论,以帮助我更好地理解您的问题。
You can simply do:你可以简单地做:
df = df1 + ':' + df2
print(df)
Which is simple and effective哪个简单有效
This should be your answer这应该是你的答案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.