简体   繁体   English

在相似的列上合并两个数据框

[英]Merging two dataframes on similar columns

I have the following two dataframes, which are snippets out of a large dataset: 我有以下两个数据框,它们是大型数据集的片段:

df1: 
date key    number 
2000  1      50
2001  1      40
2000  2      600
2001  2      650

df2:
key   key2
1       A
2       B 
3       C

I want to add the key2 column to the df1 column matched on "key". 我想将key2列添加到与“ key”匹配的df1列中。 The result should look the following: 结果应如下所示:

date key    number    key2
2000  1      50        A
2001  1      40        A
2000  2      600       B
2001  2      650       B

To do this, I am using the following command: 为此,我使用以下命令:

result = pd.merge(df1, df2, how="left", on="key")

However, this also adds the key2 "C" to the dataset, which I do not want to be added. 但是,这也将key2“ C”添加到数据集,我不想添加它。 I only want the variable key2 be appended to the df1 based on the keys of df1. 我只希望基于df1的键将变量key2附加到df1。 The information in df2 which does not match on key in df1 should be dropped. df2中与df1中的键不匹配的信息应被删除。 Therefore, my result dataframe should have one column more than df1 and the exact amount of rows. 因此,我的结果数据框应比df1多包含一列,并且行数应准确。

Does anybody know why merge "left" does not work here, because if I run the code like this, my result dataframe has 1 column more - as desired-, but also more rows than df1, which I do not want. 有人知道为什么合并“ left”在这里不起作用,因为如果我这样运行代码,我的结果数据帧将有1列(按需要),但比df1还要多,这是我不想要的。

You can use pd.Series.replace : 您可以使用pd.Series.replace

In [242]: df1['key2'] = df1.key.replace(dict(df2.values)); df1
Out[242]: 
   date  key  number key2
0  2000    1      50    A
1  2001    1      40    A
2  2000    2     600    B
3  2001    2     650    B

You can also use df.merge specifying left_on and right_on columns for the merge: 您也可以使用df.merge为合并指定left_onright_on列:

In [251]: df1.merge(df2, left_on='key', right_on='key')
Out[251]: 
   date  key  number key2
0  2000    1      50    A
1  2001    1      40    A
2  2000    2     600    B
3  2001    2     650    B

In fact, you can omit the keyword arguments, pd.merge(df1, df2) also works (for your example). 实际上,您可以省略关键字参数, pd.merge(df1, df2)也适用(例如您的示例)。

Thanks for the replies. 感谢您的答复。 I actually got it done via: 我实际上是通过以下方式完成的:

result= df1.join(df2, how="left", on="key", lsuffix='_', rsuffix='_')

I dont know why this does not yield the same result as merge... 我不知道为什么这不会产生与合并相同的结果...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM