简体   繁体   English

熊猫合并不保持“开”栏

[英]Pandas merge not keeping 'on' column

I'm trying to merge two dataframes in pandas on a common column name (orderid). 我正在尝试在公共列名称(orderid)上合并pandas两个数据帧。 The resulting dataframe (the merged dataframe) is dropping the orderid from the 2nd data frame. 结果数据帧(合并的数据帧)正在从第二个数据帧中删除orderid。 Per the documentation , the 'on' column should be kept unless you explicitly tell it not to. 根据文档 ,除非您明确告知不要,否则应保留“on”列。

import pandas as pd    
df = pd.DataFrame([[1,'a'], [2, 'b'], [3, 'c']], columns=['orderid', 'ordervalue'])
df['orderid'] = df['orderid'].astype(str)
df2 = pd.DataFrame([[1,200], [2, 300], [3, 400], [4,500]], columns=['orderid', 'ordervalue'])
df2['orderid'] = df2['orderid'].astype(str)
pd.merge(df, df2, on='orderid', how='outer', copy=True, suffixes=('_left', '_right'))

Which outputs this: 哪个输出:

|      |orderid | ordervalue_left | ordervalue_right |
|------|--------|-----------------|------------------|
| 0    | 1      | a               | 200              |
| 1    | 2      | b               | 300              |
| 2    | 3      | c               | 400              |
| 3    | 4      |                 | 500              |

What I am trying to create is this: 我想要创建的是:

|      | orderid_left | ordervalue_left | orderid_left | ordervalue_right |
|------|--------------|-----------------|--------------|------------------|
| 0    | 1            | a               | 1            | 200              |
| 1    | 2            | b               | 2            | 300              |
| 2    | 3            | c               | 3            | 400              |
| 3    | NaN          | NaN             | 4            | 500              |

How should I write this? 我该怎么写呢?

Rename the orderid columns so that df has a column named orderid_left , and df2 has a column named orderid_right : 重命名orderid列,以便df具有名为orderid_left的列, df2具有名为orderid_right的列:

import pandas as pd    
df = pd.DataFrame([[1,'a'], [2, 'b'], [3, 'c']], columns=['orderid', 'ordervalue'])
df['orderid'] = df['orderid'].astype(str)
df2 = pd.DataFrame([[1,200], [2, 300], [3, 400], [4,500]], columns=['orderid', 'ordervalue'])
df2['orderid'] = df2['orderid'].astype(str)

df = df.rename(columns={'orderid':'orderid_left'})
df2 = df2.rename(columns={'orderid':'orderid_right'})
result = pd.merge(df, df2, left_on='orderid_left', right_on='orderid_right', 
                  how='outer', suffixes=('_left', '_right'))
print(result)

yields 产量

  orderid_left ordervalue_left orderid_right  ordervalue_right
0            1               a             1               200
1            2               b             2               300
2            3               c             3               400
3          NaN             NaN             4               500

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM