简体   繁体   English

在熊猫中合并数据框

[英]merging data frames in pandas

pandas.merge acts differently for the left and right sides!!! pandas.merge在左侧和右侧的行为不同!!! For the left side if we use left_on and left_index together it shows an error, but the same for the right side works!!! 对于左侧,如果我们一起使用left_on和left_index,则会显示错误,但对于右侧相同,则有效!!!

Code: 码:

import pandas as pd
import numpy as np
right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])
left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})
pd.merge(left,right,right_index=True,left_index=True,right_on='event1')#it works and returns an empty table which is expected
pd.merge(left,right,left_index=True,right_index=True,left_on='key1')# it makes error !!!

You have a few issues going on. 您遇到了一些问题。 First your merge statements are not constructed correctly. 首先,您的合并语句构造不正确。 You shouldn't be using both a left_on and left_index or right_on and right_index at the same time. 您不应该同时使用left_onleft_indexright_onright_index You should use only one left option and one right option. 您应该只使用一个左选项和一个右选项。

The reason you get an error in your second statement is because the index levels do not match. 您在第二条语句中出现错误的原因是索引级别不匹配。 In your left merge, the left index is a single level, and you while you specify both right_index=True and right_on='event1' , the right_on attribute is taking precedence. 在您的左合并中,左索引是单个级别,并且您同时指定right_index=Trueright_on='event1'right_on属性优先。 Since both are single level integers, there is no problem. 由于两者都是单级整数,所以没有问题。 I should point out that the merge, if constructed correctly, ( pd.merge(left, right, left_index=True, right_on='event1', how='left') ) does not produce an empty DataFrame... See code below. 我应该指出的是,如果合并正确构建,则合并( pd.merge(left, right, left_index=True, right_on='event1', how='left') )不会产生空的DataFrame ...请参见下面的代码。

In your right merge, you specify using the right index with right_index=True and left_on takes precedence over left_index=True . 在您的右合并中,您可以指定将右索引与right_index=True一起使用,并且left_on的优先级高于left_index=True The issue here is that the right index is 2 levels, where as your 'key1` field is a single level string. 这里的问题是正确的索引是2个级别,其中您的“ key1”字段是单个级别的字符串。

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])

In [4]: left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})

In [5]: left
Out[5]:
   data    key1  key2
0     0    Ohio  2000
1     1    Ohio  2001
2     2    Ohio  2002
3     3  Nevada  2001
4     4  Nevada  2002

In [6]: right
Out[6]:
             event1  event2
Nevada 2001       0       1
       2000       2       3
Ohio   2000       4       5
       2000       6       7
       2001       8       9
       2002      10      11

In [5]: left_merge = left.merge(right, left_index=True, right_on='event1', how='left')

In [7]: left_merge
Out[7]:
             data    key1  key2  event1  event2
Nevada 2001     0    Ohio  2000       0       1
Ohio   2002     1    Ohio  2001       1     NaN
Nevada 2000     2    Ohio  2002       2       3
Ohio   2002     3  Nevada  2001       3     NaN
       2000     4  Nevada  2002       4       5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM